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Summary 

The  IFHE  project  (Investigation  of  Fully  Homomorphic  Encryption)  originally  set  out 
to  examine  the  security  of  FHE  schemes,  and  the  lattice  hard  problems  on  which 
they  are  based.  This  turned  out  to  be  relatively  difficult,  mainly  due  to  the  complexity 
of  building  software  libraries  which  could  support  the  advanced  mathematics  needed 
to  perform  experiments  on  modern  multi-core  processors.  For  example  the  NTL 
library  is  now  (2015)  able  to  support  multi-threaded  applications,  but  only  if  used  with 
bleeding  edge  compilers  on  the  latest  Intel  hardware.  Thus  the  initial  plan  was 
perhaps  a  little  ahead  of  its  time. 

However,  by  leveraging  additional  sources  of  funding;  most  notably  from  the  UK’s 
EPSRC  and  the  EU’s  ERC,  the  Bristol  team  was  able  to  make  substantial  headway 
in  other  areas  related  to  the  PROCEED  programme  which  were  not  originally 
envisaged.  These  are  centred  around 

•  General  techniques  for  Fully  Homomorphic  Encryption 

•  Practical  methods  for  actively  secure  Multi-Party  Computation 

•  General  theory  behind  Multi-Party  Computation 

In  this  report  we  outline  the  various  improvements  and  advances  made  by  the  team 
in  Bristol. 

Introduction 

The  PROCEED  programme’s  goal  was  to  investigate  different  methods  for 
computing  on  encrypted  data;  in  particular  Fully  Homomorphic  Encryption  and  Multi- 
Party  Computation.  Over  the  course  of  the  programme  the  IFHE  team  contributed  a 
number  of  key  advances  in  these  two  areas.  The  divide  the  contributions  into  four 
key  sub-areas: 

1 .  General  techniques  for  Fully  Homomorphic  Encryption. 

2.  Practical  methods  for  actively  secure  Multi-Party  Computation. 

3.  General  theory  behind  Multi-Party  Computation. 

4.  Lattice  based  cryptanalysis. 

Due  to  the  ability  to  leverage  additional  funding,  this  report  encompasses  the  whole 
of  the  activity  in  this  space  conducted  by  the  Bristol  team.  For  results  for  which 
DARPA  funding  was  used  to  support  the  research  we  mark  with  three  asterix’s  **** 
before  the  paragraph  detailing  the  result.  We  feel  that  this  will  give  the  reader  a 
better  notion  of  how  the  research  funded  by  DARPA  fits  within  the  overall  portfolio  of 
work  in  this  space  conducted  in  Bristol. 

Perhaps  the  two  key  take  home  messages  from  the  work  conducted  by  the  Bristol 
team  are  the  greatly  improved  performance  of  actively  secure  MPC  calculations;  in 
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particular  the  development  of  the  SPDZ  protocol  (described  below),  and  the  greatly 
improved  practical  performance  of  FHE  schemes.  These  two  advances  are  not 
unrelated,  since  the  SPDZ  protocol  makes  use  of  the  advances  in  FHE  schemes. 
Indeed  one  can  see  the  SPDZ  protocol  as  an  example  of  where  FHE  technology  can 
already  be  used  to  improve  the  performance  of  other  security  protocols. 


Methods,  Assumptions  and  Procedures 

The  work  conducted  is  a  mixture  of  traditional  cryptographic  theory  work,  and  applied 
implementation  work.  This  is  a  novel  modus  operand!,  in  that  the  Bristol  group  both 
works  on  the  theoretical  development  of  new  protocols  and  schemes  (along  with 
their  associated  security  proofs),  and  hand-in-hand  works  on  building  research 
prototypes  to  test  the  underlying  performance  of  the  resulting  protocols.  This  is 
combined  with  a  deep  knowledge  of  pure  mathematics  (number  theory  in  particular) 
which  enables  us  to  contribute  to  foundational  work  in  the  area. 

This  combination  has  allowed  us  to  contribute  a  number  of  key  ideas  to  the  field  over 
the  course  of  the  project: 

•  New  techniques  for  Single  Instruction  Multiple  Data  (SIMD)  operations  of  FHE 
schemes.  These  are  based  on  the  structure  of  rings  of  cyclotomic  integers. 

•  New  techniques  for  bootstrapping  FHE  schemes.  We  presented  two  different 
techniques  for  this;  one  based  on  extending  earlier  work  on  FHE  schemes  to 
plaintext  spaces  embedded  p-adic  rings,  and  one  to  the  use  of  different  group 
representations. 

•  Parameter  size  analysis  for  key  generation.  This  has  been  key  to  developing 
the  instantiation  of  techniques  for  the  SPDZ  protocol  suite. 

•  Our  protocol  design  work  has  focused  on  efficient  covertly  secure  offline 
processing  for  the  SPDZ  protocol  and  to  algorithms  to  implement  fast  online 
functionalities;  for  example  floating  point  calculations  and  ORAM  access. 

Results  and  Discussion 

We  discuss  each  of  the  four  areas  mentioned  above  in  turn: 

General  techniques  for  Fully  Homomorphic  Encryption. 

Much  of  our  initial  work  in  PROCEED  centred  around  the  development  of  FHE 
techniques.  In  this  work  we  focused  on  developing  new  ways  of  utilizing  FHE 
techniques  to  enable  faster  and  more  elaborate  computations.  In  other  work, 
described  in  later  sections,  we  applied  these  FHE  techniques  to  enable  faster  MPC 
protocols,  and  we  examined  the  security  of  the  underlying  lattice  based 
cryptosystems. 

***  The  first  output  from  our  DARPA  funded  work  on  FHE  was  the  development  of  a 
method  for  SIMD  evaluation  for  the  original  Gentry  FHE  scheme,  [1].  Being  journal 
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published  the  paper  took  many  years  to  appear  in  final  form.  The  paper  showed  how 
Gentry’s  original  scheme  [2],  as  optimized  by  Smart  and  Vercauteren  [3],  could  be 
modified  to  support  the  operation  on  many  plaintext  elements  at  once.  In  addition  to 
this  key  finding,  the  authors  also  proposed  a  method  to  perform  bootstrapping  in 
SIMD  parallelism.  The  work  in  this  paper  has  been  highly  influential  and  the  basic 
idea  has  been  exploited  in  all  implementations  of  FHE  schemes  since.  Although 
much  of  the  specific  techniques  are  now  less  important  since  there  are  better 
schemes  than  the  original  Gentry  scheme  now. 

***  In  order  to  support  the  above  SIMD  operations  new  key  generation  techniques 
were  needed  for  the  FHE  scheme;  these  were  introduced  in  [4,  5],  which  built  on 
earlier  work  in  [6].  In  particular  the  usage  of  Fast  Fourier  Transform  techniques  were 
used  to  simplify  the  key  generation  step  for  parameters  in  the  Smart-Vercauteren 
variant  of  Gentry’s  FHE  scheme,  in  order  to  enable  SIMD  operation  of  the  scheme. 

***  In  [5]  we  presented  attacks  on  the  SHE  schemes  at  the  time  in  a  model  in  which 
the  attacker  had  access  to  a  decryption  oracle,  before  any  challenge  ciphertext  was 
provided.  Since  all  known  FHE  schemes  include  a  decryption  hint  within  the  public 
key,  this  means  that  we  were  restricted  to  SHE  schemes.  In  addition  since  SHE 
schemes  are  malleable  only  so-called  lunch-time  chosen  ciphertext  attacks  were 
analysed.  We  presented  a  number  of  attacks,  and  showed  how  one  particular 
scheme  could  be  immunised  against  such  attacks  using  a  novel  lattice  based 
knowledge  assumption. 

***  Our  focus  on  FHE  then  turned  to  a  series  of  papers  with  Gentry  and  Halevi  on  the 
BGV  FHE  scheme  [7].  This  scheme,  based  on  Ring-LWE,  supports  the  SIMD  vector 
operations  described  above;  but  it  is  both  more  efficient  and  based  on  a  harder 
problem  than  the  initial  FHE  schemes  discussed  above.  In  our  first  work  on  this 
scheme  in  [8],  we  described  how  combining  the  SIMD  addition  and  multiplication 
operations,  with  permutation  operations  induced  from  the  Galois  group  of  the 
underlying  number  field,  enabled  us  to  produce  asymptotically  efficient  FHE 
schemes.  Whilst  mainly  theoretical  in  nature,  the  introduction  of  the  concept  of 
homomorphically  applying  Galois  action  to  the  encrypted  plaintext  has  turned  out  to 
be  highly  important  in  practice  for  obtaining  efficient  general  homomorphic 
operations. 

***  For  plaintext  rings  in  characteristic  two,  the  Galois  group  not  only  provides  a 
mechanism  to  apply  permutations  to  the  plaintext  slots,  it  also  provides  the 
Frobenius  automorphism;  which  enables  very  fast  powering  by  powers  of  two.  This 
was  exploited  in  our  next  paper  [9],  which  presented  the  first  large  scale  computation 
performed  using  SHE/FHE  technology.  We  showed  that  the  evaluation  of  a  circuit  as 
complex  as  that  of  the  AES  function  was  possible;  albeit  rather  slowly.  The  use  of 
AES  as  a  test  bench  circuit  for  computations  on  encrypted  data  was  introduced  by 
myself  in  2009  in  [10].  In  subsequent  works  various  authors  have  been  able  to 
homomorphically  evaluate  the  AES  circuit  in  under  five  minutes;  which  is  remarkable 
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given  the  state  of  the  art  at  the  start  of  the  PROCEED  programme.  This  paper  won 
the  IBM  Pat  Goldberg  Award  for  Best  Paper  in  Computer  Science  for  2012. 

***  In  our  next  paper  [11]  Gentry,  Halevi,  and  myself,  turned  our  attention  to  how  to 
perform  efficient  bootstrapping  of  BGV  ciphertexts.  We  utilized  a  special  ciphertext 
modulus,  which  close  to  a  power  of  the  plaintext  modulus,  so  as  to  provide  an 
algebraic  decryption  operation  (as  opposed  to  the  circuit  based  approaches  of  earlier 
works).  This  enabled  a  more  efficient  procedure.  In  extending  the  technique  to 
bootstrapping  SIMD  encryptions  we  required  the  development  of  techniques  to 
efficiently  homomorphically  evaluate  Fourier  Transforms. 

***  Motivated  by  the  need  to  perform  homomorphic  Fourier  transforms,  we  then 
turned  our  attention  to  a  technique  to  homomorphically  switch  from  one  ring  to 
another.  However,  it  turned  out  that  such  a  technique  would  have  wider  applicability 
in  that  it  enabled  more  efficient  noise  management  via  choosing  different  rings  at 
different  points  in  the  computation.  Thus,  with  Gentry,  Halevi  and  Peikert,  we 
developed  a  complete  theory  of  this  operation  which  was  described  in  [12]  and  [13]. 

***  In  very  recent  work  [14]  myself  and  two  members  of  my  group  develop  a  new 
novel  bootstrapping  technique  for  BGV  ciphertexts  which  has  lower  depth  than  all 
previous  techniques.  The  methodology  makes  use  of  the  general  representation 
technique  of  [11],  but  it  uses  a  new  way  of  representing  the  various  groups  under 
consideration.  It  is  unclear  at  present  whether  this  new  technique  will  be  practically 
relevant,  since  the  decrease  in  depth  is  paid  for  by  an  increase  in  the  computational 
complexity  (i.e.  the  number  of  multiplications). 

Outside  of  the  DARPA  project,  a  student  in  my  group,  working  with  colleagues  from 
Microsoft  Research  in  Redmond,  developed  an  improved  variant  of  the  NTRU  based 
FHE  scheme  [15].  The  paper  presents  a  number  of  optimizations  of  the  NTRU  based 
scheme,  as  well  as  implementation  results. 


Practical  methods  for  actively  secure  Multi-Party  Computation. 

Probably  the  most  important  results  from  the  IFHE  project  was  the  development  of 
the  SPDZ  protocol;  this  is  an  n-party  MPC  protocol  which  is  actively/covertly  secure. 
It  is  in  the  pre-processing  model,  and  the  pre-processing  utilizes  the  SIMD 
optimizations  of  the  BGV  FHE  scheme  as  described  above  in  [8].  After  the 
development  of  the  basic  protocol,  our  work  (funded  mainly  by  the  EPSRC  and  ERC) 
focused  on  building  a  large  MPC  system  based  on  the  basic  protocol.  In  the 
following  paragraphs  we  elaborate  on  the  various  optimizations  and  improvements 
obtained. 

***  This  entire  line  of  work  started  with  the  joint  work  with  Aarhus  University 
explained  in  [16].  This  paper  took  a  number  of  ideas  from  earlier  MPC  protocols 
developed  by  Aarhus  (namely  the  use  of  pre-processing  and  MACs  to  obtain  active 
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security),  and  greatly  improved  the  overall  efficiency  and  practicality  of  the  methods. 
As  mentioned  above  a  key  innovation  was  the  use  of  FHE  technology  as  a  means  to 
obtain  a  performance  improvement  over  protocols  which  did  not  utilize  FHE 
technology;  thus  this  is  probably  the  first  example  of  where  FHE  technology 
developed  in  PROCEED  was  used  to  improve  performance  of  a  security  protocol. 

In  SPDZ,  online  circuit  evaluations  are  done  via  secret-sharing  the  inputs,  and 
having  each  party  evaluating  the  circuit  almost  locally  in  his  shares.  This  can  be 
done  very  efficiently,  with  only  multiplication  requiring  the  communication  of  two  field 
elements.  All  circuit  value  are  augmented  with  a  message  authentication  code 
(MACs).  Parties  communicating,  or  using  incorrect  values  in  his  local  evaluation,  will 
be  detected  by  the  other  parties.  Previous  MAC  schemes  required  each  party  linear 
storage  in  the  total  number  of  participants.  SPDZ  brings  it  down  to  a  constant. 

***  As  an  early  test  of  the  SPDZ  system  we  implemented  a  system  to  evaluate  the 
AES  functionality  [17];  again  the  choice  of  AES  as  a  test  case  was  due  to  our 
proposing  this  in  [10].  The  initial  results  were  relatively  good,  and  comparable  to 
systems  with  a  weaker  security  guarantee.  However,  now  the  run  times  can  be 
considerably  improved. 

***  The  preprocessing  of  SPDZ  relies  on  somewhat  homomorphic  encryption  (SHE). 
This  SHE  scheme  allows  one  to  homomorphically  add  a  number  of  ciphertexts,  and 
to  perform  a  single  homomorphic  multiplication.  This  is  in  constrast  with  fully 
homomorphic  encryption,  which  allows  an  unbounded  number  of  multiplications.  Key 
open  problems  in  the  initial  paper  [16]  was  that  the  protocol  did  not  enable  reactive 
computation,  that  no  procedure  was  given  to  agree  the  FHE  public/private  key  pair. 
These  problems  were  solved  in  [18]  where  a  method  was  given  for  the  parties  to 
agree  in  a  cryptographic  key.  Also,  support  for  reactive  computations  was  given; 
exploiting  the  secret-sharing  approach,  it  was  shown  how  to  check  MACs  without 
having  to  reveal  the  key  for  this  check;  hence,  after  one  single  online  computation  is 
done,  the  participants  can  carry  on  in  a  different  computation,  with  the  remaining 
authenticated  entropy  generated  in  the  preprocessing  with  the  secret  MAC  key.  The 
online  evaluations  can  even  occur  concurrently,  since  SPDZ  operates  in  the  UC 
security  framework.  This  paper  won  the  Best  Paper  Award  at  ESORICS  2013. 

The  main  advantage  of  the  SPDZ  protocol  is  its  very  efficient  online  phase,  which 
only  requires  standard  symmetric  and  information  theoretic  primitives  to  implement. 
Since  this  is  the  only  part  of  the  protocol  dependent  on  the  parties'  inputs  to  the 
function,  the  running  time  of  the  online  phase  determines  the  latency  a  user 
experiences  when  waiting  for  the  output,  and  hence  is  crucial  to  implement  in  the 
most  efficient  way  possible.  Moreover,  to  be  able  to  implement  complex  functions  in 
MPC  we  need  a  suitable  set  of  tools  to  compile  and  run  programs  written  in  some 
high-level  language.  To  do  this,  we  designed  and  implemented  an  MPC  virtual 
machine  for  the  online  phase,  which  parses  and  executes  a  special  MPC-based  set 
of  basic  instructions.  We  then  created  a  compiler  that  reads  Python-like  programs 
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and  performs  various  optimizations  to  output  efficient  bytecode  that  can  be  run  by 
the  VM.  One  of  the  key  optimizations  is  to  minimize  the  number  of  rounds  of 
communication  in  a  given  program  by  analyzing  the  control  flow  graph,  which  greatly 
reduces  latency.  Using  this  toolchain,  we  created  very  efficient  implementations  of 
common  functions  including  crypto-specific  benchmarks  AES  and  SHA-1,  as  well  as 
other  functions  such  as  sorting  and  floating  point  arithmetic,  which  have  applications 
to  more  general  scenarios.  The  entire  system  is  described  in  [19].  We  are  continuing 
to  improve  and  extend  the  features  of  the  compiler,  and  in  the  future  would  like  a 
system  that  can  formally  verify  the  correctness  and  security  of  protocols. 


Traditionally,  MPC  only  allows  one  to  execute  binary  or  arithmetic  circuits.  This 
makes  advanced  data  structures,  such  as  arrays,  very  inefficient 
because  one  has  to  scan  the  entire  array  for  every  access.  A  technique 
called  oblivious  RAM  (ORAM)  facilitates  more  efficient  data  structures  in 
MPC.  The  scope  of  ORAM  goes  beyond  MPC,  generally  hiding  the  access  pattern 
in  a  client-server  model.  A  recent  result  on  ORAM  much  simplified  the  necessary 
computation.  This  allowed  for  the  first  implementation  of  arrays  and  priority  queues 
in  secret-sharing  MPC  [20].  Both  are  used  the  Bristol  implementation  of  an  algorithm 
for  shortest  paths  in  a  graph  (Dijkstra's  algorithm),  which  is  significantly  faster  than 
previous  implementations.  For  our  implemention,  we  had  to  improve  various  aspects 
of  our  platform,  for  example  the  support  of  non-recursive  functions  and  better 
parallelization.  Future  research  in  this  area  will  focus  on  implementing  general  RAM 
programs  in  MPC  and  aspects  thereof  such  as  cost-privacy  trade-offs. 

***  Whilst  SPDZ  is  highly  suited  to  arithmetic  circuits,  it  is  less  well  suited  to  binary 
circuits.  For  binary  circuits  the  best  protocol  seems  to  be  TinyOT  [21].  However, 
TinyOT  is  only  suited  to  two  players.  In  [22]  we  extended  the  TinyOT  protocol  to  the 
multi-party  case.  The  protocol  we  describe  allows  active  secure  evaluation  of 
Boolean  circuits  in  the  dishonest  majority  setting  with  static  corruptions.  The  idea  is 
that  of  using  an  information-theoretic  MAC  applied  to  the  oblivious  transfer  (OT) 
based  GMW  protocol,  and  producing  in  the  offline  phase  a  large  number  of  random 
authenticated  OTs,  which  are  then  used  to  perform  Beaver's  style  multiplications  in 
the  online  phase.  The  efficiency  of  the  offline  phase  is  guaranteed  by  a  variant  of  an 
OT-extension  protocol.  The  main  tool  we  use  is  an  extension  of  the  authenticated  Bit 
(aBit)  primitive  from  [21]  from  the  two-party  to  the  multi-party  setting,  that  is  obtained 
combining,  in  a  nontrivial  way,  ideas  from  [21],  [18]  and  [16].  In  particular,  we  use  a 
global  unknown  shared  key  instead  of  pairwise  keys  for  bits  authentication,  and 
then,  by  executing  the  pairwise  aBit  protocol,  we  are  able  to  obtain  secret  shared 
random  bits,  together  with  shared  MACs,  by  all  n-parties. 

After  publication  we  realised  that  the  paper  [22]  contained  a  minor  bug,  we  are 
currently  working  with  the  Aarhus  group  on  a  joint  paper  which  merges  the  work  in 
[21]  and  [22],  and  corrects  the  bug  in  the  published  version  of  [22]. 
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General  theory  behind  Multi-Party  Computation. 

As  well  as  the  more  practical  aspects  of  MPC,  we  also  examined  more  theoretical 
aspects.  Much  of  the  work  done  in  this  area  was  by  my  two  post-docs  Choudhury 
and  Patra  who  were  funded  by  EPSRC;  and  have  since  returned  to  Bangalore  where 
they  now  have  permanent  academic  positions. 

***  In  [23]  we  examined  the  situation  of  a  server  farm  with  thousands  of  nodes,  which 
wanted  to  run  an  MPC  calculation  where  a  given  (small’ish)  percentage  are  corrupt. 
We  present  a  protocol  which  does  not  require  full  communication  between  all  nodes 
at  all  times,  yet  still  obtains  full  active,  and  robust,  security.  This  is  done  via  a 
sequence  of  checkpoints,  and  then  running  between  the  checkpoints,  an  actively 
secure  dishonest  majority  protocol  between  a  suitably  large  sub-committee.  By 
selecting  the  dishonest  majority  sub-protocol  so  that  we  can  detect  cheaters  we  are 
then  able  to  apply  standard  player  elimination  strategies  so  as  to  obtain  an  overall 
robust  protocol. 

***  On  one  hand  FHE  allows  us  to  perform  computation  on  encrypted  data  using  very 
little  communication  but  a  lot  of  computational  resources;  whereas  standard  MPC 
protocols  require  little  computational  resources,  but  a  lot  of  computation.  In  [24]  we 
presented  a  technique  which  interpolates  between  the  FHE-MPC  protocol  of  Gentry 
and  more  standard  MPC  protocols.  The  protocol  enables  one  to  select  a  depth  of 
sub-circuit  which  is  dealt  with  via  the  FHE  part,  and  the  rest  is  done  via  an  MPC 
protocol.  This  division  of  the  circuit  into  layers  is  reminiscent  of  the  previous  paper 
[23], 

***  In  most  MPC  protocols  one  assumes  that  the  function  to  be  computed  is  public, 
and  hence  known  to  all  parties.  However,  there  are  some  situations  where  one  might 
want  to  keep  the  function  private.  Treating  the  function  as  one  players  input  is  clearly 
captured  by  an  MPC  protocol  which  enables  one  to  implement  a  Universal  Turing 
machine.  Thus  this  problem  is  purely  one  of  efficiency.  In  [25]  a  protocol  is  given, 
which  is  essentially  optimal,  in  the  case  of  active  adversaries.  Active  security  is 
obtained  via  the  use  of  MACs,  like  the  SPDZ  protocol,  however  the  underlying  MPC 
protocol  is  very  different  in  nature. 

Related  to  the  PRCCEED  programme  was  a  series  of  papers  by  my  post-docs 
Choudhury  and  Patra  on  aspects  of  MPC  in  the  case  of  asynchronous  networks. 
Almost  all  practical  MPC  protocols  assume  that  the  underlying  network  is 
synchronous,  however  real  networks  are  asynchronous.  This  is  a  particular  problem 
in  MPC  as  a  receiver  will  never  know  if  the  fact  he  did  not  receive  a  message  is  due 
to  network  issues,  or  a  corrupted  sender.  Thus  there  is  a  whole  sub-area  of  MPC 
research  (currently  mostly  theoretical)  which  deals  with  issues  related  to 
asynchronicity.  Bristol’s  work  in  this  area  during  the  time  of  the  PROCEED 
programme  resulted  in  the  following  outputs  [26]  [27]  [28]  [29]  [30]  [31] 
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Finally  in  [32]  secure  two-party  computation  with  single  adaptive  corruptions  in  the 
nonerasure  model  where  at  most  one  party  is  adaptively  corrupted  was  studied.  To 
distinguish  this  notion  from  fully  adaptive  security,  where  both  parties  may  get 
corrupted,  we  denote  it  by  one-sided  adaptive  security.  Our  goal  in  this  work  is  to 
make  progress  in  the  study  of  the  efficiency  of  two-party  protocols  with  one  sided 
security.  Our  measure  of  efficiency  is  the  number  of  public  key  encryption 
operations.  Loosely  speaking,  our  primitives  are  parameterized  by  a  public  key 
encryption  scheme  for  which  we  count  the  number  of  key  generation,  encryption  and 
decryption  operations.  More  concretely,  these  operations  are  captured  by  the 
number  of  exponentiations  in  several  important  groups. 


Lattice  based  cryptanalysis. 

Lattice-based  cryptography  is  one  of  the  main  candidates  for  cryptography  that 
remains  secure  against  cryptanalysis  using  a  quantum  computer.  For  some  time 
now,  cryptographers  and  quantum  algorithms  researchers  alike  have  not  found  any 
quantum  attack  that  provides  a  speed-up  similar  to  Shor's  algorithm  for  RSA  and 
Discrete  Log.  However,  Grover's  algorithm  is  also  important  for  cryptography,  as  it 
implies  that  we  need  to  use  keys  that  are  twice  as  long  in  the  symmetric  setting  for 
example.  Of  particular  relevance  to  PROCEED  is  the  fact  that  all  FHE  schemes 
currently  known  rely  for  their  security  on  the  hardness  of  various  lattice  based 
problems. 

In  [33],  my  students  examined  the  effects  of  using  Grover  inside  the  so-called  sieving 
algorithms  for  solving  the  shortest  vector  problem  in  lattices.  Previously,  there  had 
been  one  work  using  Grover  on  a  single  (and  different)  algorithm,  but  we  extend  this 
to  the  whole  class  of  sieving  algorithms.  Our  analysis  shows  that  the  application  of 
Grover  allows  sieving-type  algorithms  to  be  asymptotically  faster  than  the  best 
classical  algorithms  (which  do  not  use  sieving).  As  a  rule  of  thumb,  it  appears  that 
the  keys  need  to  be  increased  by  a  factor  4/3  to  achieve  the  same  classical  level  of 
security. 

***  My  student  and  I  further  examined  the  relation  between  different  parameters  and 
security  in  [34],  but  for  the  best  known  classical  of  algorithm.  Previous  work  had 
always  discarded  the  dimension  parameter  as  a  second  order  term  for  security. 
However,  we  are  interested  in  FHE  schemes,  where  the  dimension  needs  to  be  huge 
for  the  functionality  of  the  scheme.  We  designed  an  approach  that  takes  the 
dimension  into  account  as  well,  which  led  to  a  decrease  in  parameters  for  the  same 
security  when  we  applied  it  to  several  FHE  schemes. 

Lattices  are  not  only  important  for  the  construction  of  cryptography.  The  history  of 
lattices  in  cryptanalysis  stretches  even  further,  all  the  way  back  to  the  breaking  of 
knapsack-based  schemes  and  breaking  RSA  with  partial  information  on  the  key.  It 
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turns  out  that  lattices  are  quite  good  at  extending  lots  of  different  instances  of  partial 
information  into  the  full  information,  e.g.,  the  secret  key. 

***  In  joint  work  with  colleagues  from  Adelaide  [35]  we  combined  a  side-channel 
attack  on  ECDSA  in  OpenSSL  with  lattice  algorithms  to  recover  the  secret  key.  We 
modified  the  previous  approaches  to  combine  instances  where  the  amount  of  partial 
information  varies  per  instance.  Our  results  show  that  such  attacks  can  be  very 
efficient,  both  in  terms  of  the  number  of  signatures  required  and  the  time  required. 

Conclusion 

As  can  be  seen  the  IFHE  project  has  created  a  large  number  of  outputs  in  a  range  of 
topics  related  to  the  PROCEED  programme.  The  main  outputs  have  shown  how 
computation  on  encrypted  data  is  now  much  closer  to  a  deployable  protocol 
compared  to  the  state  of  the  art  at  the  start  of  the  programme.  We  have  identified 
areas  in  which  FHE  can  be  used  as  a  performance  enhancing  technology  (e.g.  the 
SPDZ  protocol),  and  we  have  shown  that  active  security  can  be  achieved  for  MPC 
protocols  with  very  little  performance  overhead  compared  to  passively  secure 
protocols. 

Our  work  in  this  space  has  attracted  considerable  interest  from  partners  outside  of 
the  PROCEED  programme.  Follow  up  work  is  continuing  in  the  EU  funded 
PRACTICE  project  on  MPC,  and  in  the  HEAT  project  on  FHE.  We  have  also 
conducted  a  joint  project  with  Thales  funded  by  the  UKs  DSTL  into  applications  of 
MPC  with  in  the  UK  defence  sector.  Finally,  with  Bar-Nan  University  we  have  formed 
a  company  Dyadic  Security  which  is  looking  at  applications  of  MPC  to  breach 
mitigation  on  computer  networks. 
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Fully  Homomorphic  Encryption 
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Learning  With  Errors 
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Multi-Party  Computation 
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Single  Instruction  Multiple  Data 

SPDZ 
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The  following  appendix  contains  the  papers  which  are  the  output  of  the  project.  For 
each  paper  we  give  the  full  version  from  the  lACR  ePrint  Archive  and  not  the 
extended  abstract  (which  is  the  one  usually  published  in  conferences). 
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Abstract  At  PKC  2010  Smart  and  Vercauteren  presented  a  variant  of  Gentry’s  fully  homomorphic  public  key  encryption 
scheme  and  mentioned  that  the  scheme  could  support  SIMD  style  operations.  The  slow  key  generation  process  of 
the  Smart- Vercauteren  system  was  then  addressed  in  a  paper  by  Gentry  and  Halevi,  but  their  key  generation  method 
appears  to  exclude  the  SIMD  style  operation  alluded  to  by  Smart  and  Vercauteren.  In  this  paper,  we  show  how  to  select 
parameters  to  enable  such  SIMD  operations.  As  such,  we  obtain  a  somewhat  homomorphic  scheme  supporting  both 
SIMD  operations  and  operations  on  large  finite  fields  of  characteristic  two.  This  somewhat  homomorphic  scheme  can  be 
made  fully  homomorphic  in  a  naive  way  by  recrypting  all  data  elements  seperately.  However,  we  show  that  the  SIMD 
operations  can  be  used  to  perform  the  recrypt  procedure  in  parallel,  resulting  in  a  substantial  speed-up.  Finally,  we 
demonstrate  how  such  SIMD  operations  can  be  used  to  perform  various  tasks  by  studying  two  use  cases:  implementing 
AES  homomorphically  and  encrypted  database  lookup. 


1  Introduction 

For  many  years  a  long  standing  open  problem  in  cryptography  has  been  the  construction  of  a  fully  homomorphic  en¬ 
cryption  (FHE)  scheme.  The  practical  realisation  of  such  a  scheme  would  have  a  number  of  consequences,  such  as 
computation  on  encrypted  data  held  on  an  untrusted  server.  In  2009  Gentry  [10,  II]  came  up  with  the  first  construction 
of  such  a  scheme  based  on  ideal  lattices.  Soon  after  Gentry’s  initial  paper  appeared,  two  other  variants  were  presented 
[6,23];  the  method  of  van  Dijk  et  al.  [6]  is  a  true  variant  of  Gentry’s  scheme  and  relies  purely  on  the  arithmetic  of 
the  integers;  on  the  other  hand  the  scheme  of  Smart  and  Vercauteren  [23]  is  a  specialisation  of  Gentry’s  scheme  to  a 
particular  set  of  parameters. 

All  schemes  make  use  of  Gentry’s  idea  of  first  producing  a  somewhat  homomorphic  encryption  scheme  and  then 
applying  a  bootstrapping  process  to  obtain  a  complete  FHE  scheme.  This  bootstrapping  process  requires  a  “dirty” 
ciphertext  to  be  publicly  reencrypted  into  a  “cleaner”  ciphertext.  This  requires  that  the  somewhat  homomorphic  scheme 
can  homomorphically  implement  its  own  decryption  circuit,  and  so  must  be  able  to  execute  a  circuit  of  a  given  depth. 

Gentry  and  Halevi  [12]  presented  an  optimized  version  of  the  Smart- Vercauteren  variant.  In  particular,  the  optimized 
version  has  an  efficient  key  generation  procedure  based  on  the  Fast  Fourier  Transform  and  a  simpler  decryption  circuit. 
These  two  major  optimizations,  along  with  some  other  minor  ones,  allow  Gentry  and  Halevi  to  actually  implement  a 
“toy”  FHE  scheme,  including  the  ciphertext  cleaning  operation. 
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Smart  and  Vercauteren  mentioned  in  [23]  that  their  scheme  can  be  adapted  to  support  SIMD  (Single-Instruction 
Multiple-Data)  style  operations  on  non-trivial  finite  fields  of  characteristic  two,  as  opposed  to  operations  on  single  bits, 
as  long  as  the  parameters  are  chosen  appropriately.  However,  the  parameters  proposed  in  both  [12]  and  [23]  do  not 
allow  such  SIMD  operations,  nor  direct  operation  on  elements  of  finite  fields  of  characteristic  two  of  degree  greater 
than  one.  In  particular,  the  efficient  key  generation  method  of  [12]  precludes  the  use  of  parameters  which  would  support 
SIMD  style  operations.  Using  fully  homomorphic  SIMD  operations  would  be  an  advantage  in  any  practical  system  since 
FHE  schemes  usually  embed  relatively  small  plaintexts  within  large  ciphertexts.  Allowing  each  ciphertext  to  represent  a 
number  of  independent  plaintexts  would  therefore  enable  more  efficient  use  of  both  space  and  computational  resources. 

In  this  paper  we  investigate  the  use  of  SIMD  operations  in  FHE  systems  in  more  depth.  In  particular  we  show  how 
by  adapting  the  parameter  settings  of  [12,23]  one  can  obtain  the  benefits  of  SIMD  operations,  whilst  still  maintaining 
many  of  the  important  efficiency  improvements  obtained  by  Gentry  and  Halevi.  We  thus  obtain  a  somewhat  homomor¬ 
phic  scheme  supporting  SIMD  operations,  and  operations  on  large  finite  fields  of  characteristic  two.  We  then  discuss 
how  one  can  use  the  SIMD  operations  to  perform  the  recrypt  procedure  in  parallel.  In  addition  we  explain  how  such 
SIMD  operations  could  be  utilized  to  perform  a  number  of  interesting  higher  level  operations,  such  as  performing  AES 
encryption  homomorphically  and  searching  an  encrypted  database  on  a  remote  server. 

The  paper  is  structured  as  follows.  Section  2  presents  some  basic  facts  about  finite  fields  and  algebras  defined  as 
quotients  of  polynomial  rings.  Section  3  explains  how  these  algebras  allow  us  to  create  a  somewhat  homomorphic 
encryption  scheme  whose  message  space  consists  of  multiple  parallel  copies  of  a  given  finite  field  of  characteristic 
two.  Section  4  describes  a  recryption  procedure  for  the  somewhat  homomorphic  scheme  that  preserves  the  underlying 
message  space  structure.  Section  5  contains  our  main  contribution,  namely,  a  recryption  procedure  that  makes  use  of 
the  SIMD  operations.  This  new  procedure  significantly  reduces  the  cost  of  recryption.  To  justify  our  claims.  Section  7 
presents  implementation  timings  for  a  toy  example.  Finally,  Section  8  gives  possible  applications  of  the  SIMD  structure 
of  our  FHE  scheme,  including  bit-sliced  implementations  of  algorithms,  such  as  performing  AES  encryption  using  an 
encrypted  key,  and  database  search. 

Since  the  appearance  of  the  current  paper  on  lACR  e-Print  in  March  2011  the  basic  idea  of  utilizing  SIMD  operations 
has  been  used  by  a  number  of  authors,  and  the  methods  in  this  paper  have  been  extended.  In  particular  in  [22]  the  authors 
present  further  optimizations  of  the  key  generation  method  proposed  in  this  paper.  It  had  already  been  noted  in  [2]  that 
the  ring-LWE  based  FHE  schemes  also  posses  exactly  the  same  form  of  SIMD  operation  in  this  paper.  In  a  series  of 
work  [13-16]  Gentry,  Halevi  and  Smart  make  extensive  use  of  FHE  based  SIMD  operations  in  a  number  of  contexts 
related  to  the  BGV  cryptosystem  [2].  In  [13]  they  show  how  using  SIMD  operations  combined  with  the  BGV  scheme 
allows  one  to  obtain  an  asymptotically  efficient  FHE  scheme;  then  in  [14]  they  show  (among  other  results)  how  a  SIMD 
evaluation  of  the  FFT  transform  can  be  used  to  possibly  improve  bootstrapping  functionality;  then  in  [15]  they  actually 
implement  the  example  application  we  present  in  Section  8.2;  finally  in  [16]  they  show  how  one  can  in  SIMD  switch 
the  underlying  finite  field  over  which  one  is  working  to  a  smaller  one,  thus  obtaining  performance  improvements  as  one 
descends  via  a  levelled  FHE  scheme.  In  [9]  the  authors  also  utilize  the  SIMD  mode  of  the  basic  somewhat  homomorphic 
BGV  scheme  to  achieve  a  higher  efficient  offline  phase  for  a  multi-party  computation  protocol. 


1.0.1  Notations 

We  end  this  introduction  by  presenting  the  notations  that  will  be  used  throughout  this  paper.  Assignment  to  variables 
will  be  denoted  by  a;  <—  j/.  If  A  is  a  set  then  x  ^  A  implies  that  x  is  selected  from  A  using  the  uniform  distribution. 
If  A  is  an  algorithm  then  x  ■<—  A  implies  that  x  is  obtained  from  running  A,  with  the  resulting  probability  distribution 
being  induced  by  the  random  coins  of  A.  For  integers  x,  d,  we  denote  [x]^  the  reduction  of  x  modulo  d  into  the  interval 
[—d/2,  d/2).  If  y  is  a  vector  then  we  let  denote  the  i’th  element  of  y. 

Polynomials  over  an  indeterminate  X  will  (usually)  be  denoted  by  uppercase  roman  letters,  e.g.  F{X).  We  make 
an  exception  for  the  cyclotomic  polynomials  which  are  as  usual  denoted  by  <Pm{X).  Elements  of  finite  fields  and 
number  fields  defined  by  a  polynomial  F{X),  i.e.  elements  of  F2[A']/F(X)  and  Q[W]/F(X),  can  also  be  represented 
as  polynomials  in  some  fixed  root  of  T’(X)  in  the  algebraic  closure  of  the  base  field.  We  shall  denote  such  polynomials 
by  lower  case  greek  letters,  with  the  fixed  root  (being  an  element  of  the  field)  also  being  denoted  by  a  lower  case  greek 
letter;  for  instance  7(d)  where  F{9)  =  0.  When  the  underlying  root  of  F{X)  is  clear  we  shall  simply  write  7. 

For  a  polynomial  F{X)  €  Q[X]  we  let  ||F(X)||oo  denote  the  co-norm  of  the  coefficient  vector,  i.e.  the  maximum 
coefficient  in  absolute  value.  Similarly,  for  an  element  7  e  Q[A']/T'(X)  we  write  ||7||oo  for  |l7(A')||oo  where  'y{X)  is 
the  corresponding  unique  polynomial  of  degree  <  deg(T’).  If  F{X)  G  Q[X]  then  we  let  [T’(W)J  denote  the  polynomial 
in  Z[X]  obtained  by  rounding  the  coefficients  of  F{X)  to  the  nearest  integer.  Similary,  for  an  element  7  G  Q[X]/T’(X) 
we  write  [7J  for  [7(W)J. 
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2  Fields  and  Homomorphisms 


To  present  the  SIMD  operations  in  full  generality  and  to  understand  how  they  can  be  utilized  we  first  set  up  a  number 
of  finite  fields  and  homomorphisms  between  them.  We  let  F{X)  G  F2[X]  denote  a  monic  polynomial  of  degree  N  that 
we  assume  to  split  into  exactly  r  distinct  irreducible  factors  of  degree  d  =  N/r 

Fix)  ■.=  f[F,iX). 

i=l 

In  practice  F{X)  will  be  the  reduction  modulo  two  of  a  specially  chosen  monic  irreducible  polynomial  over  Z.  This 
polynomial  FiX)  defines  a  number  field  K  =  Q(6)  =  Q[X]/(F),  where  6  is  some  fixed  root  in  the  algebraic  closure 


Let  A  denote  the  algebra  A  :=  F2 [X]/(T'),  then  by  the  Chinese  Remainder  Theorem  we  have  the  natural  isomor¬ 
phisms 


A  ^  F2[X]/(Fi)  O  •  •  •  (g)  ¥2[X]/{Fr), 
=  F2d  ®---®  F2d  , 


i.e.  A  is  isomorphic  to  r  copies  of  the  finite  field  F2d .  Arithmetic  in  A  will  be  defined  by  polynomial  arithmetic  in  the 
indeterminate  X  modulo  the  polynomial  F{X).  Our  goal  in  this  section  is  to  relate  arithmetic  in  A  explicitly  with  the 
elements  in  subfields  of  the  F2d . 

We  let  9i  denote  a  fixed  root  of  Fi{X)  in  the  algebraic  closure  of  F2.  To  aid  notation  we  define  :=  F2[A]/(Ti) 
and  note  that  all  the  are  isomorphic  as  fields,  where  the  isomorphisms  are  explicitly  given  by 


^i,j 


hi  - >  hj 

aiOi)  I — >  a(p^jl6j)) , 


with  Pijidj)  a  fixed  root  of  Fi  in  hj,  i.e.  we  have  FiipijiX))  =  0  (mod  Fj{X)). 

For  each  divisor  n  of  d,  the  finite  field  ]K„  :=  F2"  is  contained  in  F2£i.  We  assume  a  fixed  canonical  representation 
for  ]K„  as  F2[X\/ KniX)  for  some  irreducible  polynomial  KniX)  G  F2[X]  of  degree  n,  which  is  often  fixed  by  the 
application.  We  let  denote  a  fixed  root  of  Kn{X)  in  the  algebraic  closure  of  F2.  Since  Kn  is  contained  in  each  of 
defined  above,  we  have  explicit  homomorphic  embeddings  given  by 


hi 

i^i))  5 


with  cfnjiOi)  a  fixed  root  of  KniX)  in  L^,  i.e.  =  0  (mod  FiiX)).  Note  that  the  above  mapping  is  linear 

in  the  coefficients  of  aitp). 

Combining  the  above  homomorphic  embedding  with  the  Chinese  Remainder  Theorem,  we  obtain  a  homomorphic 
embedding  of  I  <  r  copies  of  Kn  into  the  algebra  A  via 


Fn 


iKiitp),  Kiitp))  I - >  11^=1 


A 

,ix))  ■  H,iX) 


G^iX), 


The  polynomials  Hi  (X )  and  Gi  (X )  are  given  by  the  Chinese  Remainder  Theorem  and  are  defined  as 


H,iX)  ^  F(X)/C,(X)  and  G,(X)  ^  1/1T,(X)  (mod  F,(X)). 

We  shall  denote  component  wise  addition  and  multiplication  of  elements  in  ICji  by  ki  -|-  k2  and  ki  x  k2.  As  such 
we  have  constructed  two  equivalent  methods  of  computing  with  elements  in  ICjj:  the  first  method  simply  computes 
component  wise  on  vectors  of  I  elements  in  ]K„,  whereas  the  second  method  first  maps  all  inputs  to  the  algebra  A  using 
r„  ; ,  performs  computations  in  A  and  finally  maps  back  to  ICjj  via  r~j .  Note  that  by  construction  ICjj  and  Fnj  (Kn)  are 
isomorphic,  so  that  r~j  is  always  well  defined  on  the  result  of  the  computation. 

The  goal  of  this  paper  is  to  produce  a  fully  homomorphic  encryption  scheme  that  allows  us  to  work  via  SIMD 
operations  on  I  copies  of  Kn  at  a  time,  for  all  n  dividing  d,  by  computing  in  the  algebra  A.  In  particular,  this  enables  us  to 
support  SIMD  operations  both  in  F2  andF2d.  To  make  things  concrete  the  reader  should  consider  the  example  of  F(X) 
being  the  3485-th  cyclotomic  polynomial.  In  this  situation  the  polynomial  T’(X)  has  degree  N  =  (^(3485)  =  2560, 
and  modulo  two  it  factors  into  64  polynomials  each  of  degree  40.  This  polynomial  therefore  allows  us  to  compute  in 
parallel  with  up  to  64  elements  of  any  subfield  of  F 240 .  For  instance,  by  selecting  n  =  I  and  1  =  64  we  perform  64 
operations  in  F2  in  parallel;  selecting  n  =  40  and  /  =  1  we  perform  operations  in  a  single  copy  of  the  finite  field  F240; 
whereas  selecting  n  =  8  and  I  =  16  we  perform  SIMD  operations  on  what  is  essentially  the  AES  state  matrix,  namely 
16  elements  of  F28 . 
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3  Somewhat  Homomorphic  Scheme  Supporting  SIMD  Operations  in  Kn 

In  this  section,  we  recall  the  Smart- Vercauteren  variant  of  Gentry’s  somewhat  homomorphic  scheme  and  show  that 
it  can  support  SIMD  operations  in  r  copies  of  the  finite  field  ]K„  by  modifying  key  generation.  Note  that  the  recent 
FHE  schemes  based  on  ring-LWE  [3]  also  support  such  style  operations,  and  may  be  preferable  in  practice  due  to  their 
improved  key  generation  procedures;  for  an  extension  of  some  of  the  ideas  in  this  paper  to  the  ring-LWE  schemes  see  [2, 
13-16].  However,  whilst  our  SIMD  style  operations  extend  to  the  ring-LWE  based  somewhat  homomorphic  schemes, 
our  parallel  recryption  step  does  not  carry  over.  We  will  return  to  this  point  later  on. 


3.1  Smart- Vercauteren  somewhat  homomorphic  scheme 

Let  F  e  be  a  monic  irreducible  polynomial  of  degree  N  and  let  K  =  Q(6)  =  Q[X]/(F)  denote  the  number 

field  defined  by  F.  Gentry’s  original  scheme  uses  two  co-prime  ideals  I  and  J  in  the  number  ring  Z[6].  The  ideal  I 
is  chosen  to  have  small  norm  JV{I)  =  j](Z[0]//)  and  determines  the  plaintext  space,  namely  Z[6]//.  For  this  reason, 
I  =  (2)  is  chosen  in  practice.  Note  that  in  the  case  of  a  general  F  the  quotient  ring  Z[6]/(2)  is  an  algebra  of  a  somewhat 
more  general  type  than  discussed  in  Section  2.  We  shall  choose  F  later  on  such  that  one  obtains  precisely  the  type  of 
algebra  considered  in  Section  2.  The  ideal  J  determines  the  private/public  key  pair:  the  private  key  consists  of  a  “good” 
representation  of  J,  whereas  the  public  key  consists  of  a  “bad”  representation  of  J. 

To  clarify  the  notions  of  “good”  and  “bad”,  we  first  describe  the  Smart- Vercauteren  instantiation.  The  ideal  J  is 
chosen  to  be  principal,  i.e.  generated  by  one  element  7  e  Z[0],  and  has  the  following  additional  property:  let  d  = 
JV{J)  =  tt(Z[6']/ J)  =  I Vjj/q(7)|,  where  denotes  the  number  field  norm  of  K  to  Q,  then  there  exists  a  unique 

a  €  such  that 

J  =  i'y)  =  {d,d-  a) . 

The  element  a,  and  the  integer  d,  can  be  computed  in  polynomial  time  by,  for  example,  computing  the  Hermite  Normal 
Form  representation  of  the  ideal. 

The  “good”  representation  of  J  (i.e.  the  private  key)  corresponds  to  the  small  generator  7,  whereas  the  “bad”  repre¬ 
sentation  (i.e.  public  key)  is  (d,  6  —  a).  The  additional  property  of  J  is  equivalent  with  the  requirement  that  the  Hermite 
Normal  Form  representation  of  J  has  the  following  specific  form 

/  d  0  0  ...  0\ 

-a  1  0  0 

0  1  0 

0  0  ly 

where  the  entries  below  d  in  the  first  column  are  taken  modulo  d.  Another  characterisation  of  this  property  is  that  the 
ideal  J  simply  contains  an  element  of  the  form  6  —  a.  This  is  clearly  necessary  since  J  can  be  generated  by  (d,  6  —  a), 
but  it  is  also  sufficient.  Indeed,  since  7  €  J,  this  implies  that  d  e  J,  so  (d,  d  —  a)  C  J  and  since  both  ideals  have  the 
same  norm,  we  must  have  J  =  {d,  6  —  a).  As  such,  there  exists  an  element  ly  e  Z[d]  with  ly  ■  j  =  6  —  a.  To  derive  an 
easy  verifiable  condition  on  7,  we  define  the  algebraic  number  £Z[9\  such  that 

C-7  =  rf.  (1) 

Multiplying  v  ■  'y  =  6  —  a  on  both  sides  with  C,  gives  the  condition  d-u  =  6-  (  —  a-(.  Write  (  =  ^tid 

F{X)  =  Fi  ■  W*,  then  computing  the  product  6  ■  explicitly  and  reducing  modulo  d  finally  leads  to: 

a  •  Ci  =  0-1  -  mod  d,  (2) 


for  all  i  =  0, . . . ,  V  —  1  where  (C_i  =0. 

Note  that  the  two  element  representation  (d,  9  —  a)  defines  an  easily  computable  homomorphism 


N-l  N-1 

F[  :  Z[9]  ^  Zj^  :  rj  =  rji  ■  6^  1-^  H {rj)  =  rji  ■  a  mod  d .  (3) 

i=0  i=0 

The  homomorphism  H  also  makes  it  very  easy  to  test  if  an  element  77  G  Z[d]  is  contained  in  the  ideal  J,  namely  7  G  J 
if  and  only  if  H{ri)  =  0.  Furthermore,  given  the  “good”  representation  7,  it  is  possible  to  invert  H  on  a  small  subset  of 
Z[9]  as  shown  by  the  following  lemma. 
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Lemma  1  Let  J  =  (7)  =  {d,6  —  a)  and  ^  •  7  =  rf  and  let  H  be  defined  as  in  (3).  Let  rj  e 
we  have 


with  ||??||oo  <  U,  then 


h  =  H{'n)- 


d 


■  7 


for 


U  = 


d 


2 -So 


where  5oo  =  sup  1 1|  —  '■  Furthermore,  for  |i?7||oo  <  U  we  have 

\H{v)  ■  C]d  =  [v  ■  C]d  =  V  (  ■ 


(4) 


Proof  It  is  easy  to  see  that  Hfq)  —  rj  is  contained  in  the  principal  ideal  generated  by  7.  As  such,  there  exists  a  fd  €  Z[9] 
such  that  H{ri)  —  Tj  =  P  ■  Using  (  =  djj,  we  can  write 


_  Hjr,)  ■  C  V( 
^  d  d 


(5) 


Since  f  has  integer  coefficients,  we  can  recover  it  by  rounding  the  coefficients  of  the  first  term  if  the  coefficients  of  the 
second  term  are  strictly  bounded  by  1/2.  This  shows  that  rj  can  be  recovered  from  H{ri)  for  |jr;||oo  <  d/{2-  Soo  ■  ||CI|oo). 
Furthermore,  equation  (5)  shows  that  [H{ri)  ■  (/]d  =  [ri  ■  /"Jj;  and  since  \\ri\\oo  <  U,  we  have  [ri  ■  /"Jj;  =  rj  ■  f. 

Corollary  1  Using  the  notation  of  Lemma  1,  assume  that  |j77||  00  <  U  j  L,  for  some  L  >  1,  then  for  i  =  Q, . . . ,  N  —  \we 
have 


Hjn)  ■  Q 
2L  d 


H{n)  ■  0 
d 


< 


1 

^  ’ 


i.e.  H{rf)  ■  Q/d  is  within  distance  1/2Z/  of  an  integer,  where  (as  before)  Q  is  the  ith  coefficient  of  C,  in  the  polynomial 
basis. 

Proof  Follows  directly  from  equation  (5)  and  the  assumption  on  77. 

The  above  lemma  shows  that  we  can  recover  an  element  r;  from  its  image  under  H,  when  its  norm  is  not  too  large. 
As  such  we  obtain  a  trapdoor  one  way  function  that  can  be  used  as  the  basis  for  encryption.  Using  these  preliminaries 
we  are  now  ready  to  define  key  generation,  encryption  and  decryption. 

Key  Generation:  Input  parameters:  N,  t 

Generate  a  monic  irreducible  polynomial  F  £  Z[A]  of  degree  N  with  small  coefficients,  defining  the  number  field 
K  =  Q(9)  =  Q[X]/(T’).  Choose  an  element  7  e  Z[9]  with  7  =  1  mod  2  such  that  the  coefficients  of  7  are  smaller 
in  absolute  value  than  2*  (at  least  one  coefficient  should  be  a  t-bit  integer).  This  can  be  done  for  example  by  uniformly 
selecting  7  from  all  polynomials  of  degree  N  —  I  with  coefficients  bounded  by  2*  in  absolute  value,  although  other 
distributions  are  possible.  Compute  the  norm  d  =  |  A]jj7Q(7)|  as  well  as  the  element  /  e  Z[9]  with  •  7  =  d.  If  d  is  even, 
choose  a  new  7.  If  d  is  odd,  compute  a  =  — Cat-i  •  Tb/Co  and  verify  whether  (2)  holds  for  alH  =  1, . . . ,  A"  —  I.  If  not, 
generate  a  new  7.  Otherwise,  the  public  key  is  the  pair  pk  :=  (d,  a)  whereas  the  private  key  is  the  element  sk  :=  ^. 

In  practice,  N  will  be  of  the  order  a  few  thousand  and  t  a  few  hundred.  The  size  of  d  can  be  approximated  roughly 
by  -2^*;  this  therefore  results  in  a  d  of  several  million  bits. 

Encryption:  Input  parameters:  p,,  pk  :=  (d,  a),  message  M  £  A  :=  F2[A]/(A(A)) 

The  plaintext  space  consists  of  (a  subalgebra  of)  the  algebra  A  :=  F2[A]/(A(A)).  Represent  the  message  M  as  a 
polynomial  M(X)  £  Z[X]  with  coefficients  in  {0, 1}.  Uniformly  generate  a  “noise”  polynomial  R{X)  £  Z[X]  of 
degree  <  A,  subject  to  with  ||A(A)||oo  <  p,  and  compute  the  ciphertext  as 

c^[M{a)  +  2-R{a)U. 

Note  that  the  ciphertext  is  an  element  in  Z^  and  that  encryption  simply  corresponds  to  applying  the  homomorphism  H 
to  the  algebraic  integer  C{6)  :=  M(9)  +  2  •  R(9).  Furthermore,  it  should  be  clear  that  if  we  can  recover  C(9),  then 
we  can  decrypt  simply  by  computing  C'(A)  mod  2.  The  encryption  function  is  denoted  as  c  <—  Encrypt(M(A),  pk). 
If  M{X)  £  A  then  we  say  M|^  =  M{a)  (mod  d)  is  a  “trivial”  encryption  of  M(A),  i.e.  it  is  an  encryption  with  no 
randomness. 

Decryption:  Input  parameters:  ciphertext  c  e  Zj^,  sk  :=  f 
Given  the  ciphertext  c  G  Z^i,  compute  the  element  C{9)  as 

C{9)  =  c- 


c-  C 

d 


and  then  set  M{X)  =  C{X )  mod  2.  Note  that  here  we  used  the  fact  that  7=1  mod  2.  We  can  obtain  a  simpler  decryp¬ 
tion  procedure  using  the  last  statement  in  Lemma  1 .  Indeed,  if  c  is  a  decryptable  ciphertext,  we  know  that  ||  (7(0)  ||  00  <  U 
and  thus  that 

[c-C]d  =  CW-C- 
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Since  7=1  mod  2  and  d  is  odd  with  d  =  7  •  C,  we  see  that  also  C  =  1  mod  2.  Furthermore,  C{9)  =  M(9)  +  2R{6), 
so  we  obtain 

[c  •  C]d  mod  2  =  M{e)  mod  2  =  M{X) . 

This  shows  that  for  we  can  recover  the  coefficients  of  M{X)  =  mo  +  mi  ■  X  +  ■  ■  ■  +  m^v-i  •  X^~^ 

one  by  one,  by  computing 

mi  =  [c-Ci]d  (mod  2). 

We  write  M{X)  ^  Decrypt(c,  sk).  Note  that  to  save  space  for  key  storage,  it  suffices  to  store  ^0,  since  the  other  Q 
follow  from  equation  (2).  In  particular,  we  obtain  the  closed  expression  Q  =  Wi  ■  Co  with 


Wi  = 


1 


E 


U=i+1 


(mod  d)  . 


(6) 


Since  the  Wi  can  be  publicly  computed,  we  can  decrypt  rrii  =  [c  ■  Wi  ■  Cold  (mod  2).  We  pause  to  note  that  it  is  this 
linear  relationship  between  the  distinct  decryption  keys  Ct  which  enables  the  parallel  recryption  procedure  we  describe 
later.  For  ring-LWE  based  somewhat  homomorphic  schemes  supporting  SIMD  operations,  where  such  a  simple  linear 
relation  does  not  hold,  it  seems  much  harder  to  produce  a  parallel  recryption  procedure  using  the  squashing  paradigm 
of  Gentry.  Although  see  [14]  for  a  possibly  more  efficient  method  in  this  direction. 

Homomorphic  Operations:  It  is  easy  to  see  that  the  scheme  is  somewhat  homomorphic,  where  the  operations  being 
performed  are  addition  and  multiplication  of  ciphertexts  modulo  d.  Indeed,  let  Ci  =  H{Ci{9))  =  H{Mi(9)  +  2R\(9)) 
for  i  =  1,2,  then  we  have  that 


Cl  +  C2  =  H{Mi{9)  +  M2{9)  +  2{Ri{9)  +  R2{9))) 

Cl -02  =  H{Mi{9)  ■  M2{9)  +  2{Mi{9)R2{9)  +  M2{9)Ri{9)  +  2Ri{9)R2{9))) . 


This  shows  that  operations  on  the  ciphertext  space  induce  corresponding  operations  on  the  plaintext  space,  i.e.  the  alge¬ 
bra  A.  Thus  it  is  clear  that  the  somewhat  homomorphic  scheme  supports  SIMD  operations  and  operations  on  elements  in 
possibly  large  degree  (i.e.  degree  n)  finite  fields.  To  make  a  distinction  when  we  are  performing  homomorphic  operations 
we  will  use  the  notation  ©  and  0  to  denote  the  homomorphic  addition  and  multiplication  of  ciphertexts. 


3.2  Efficient  key  generation  and  SIMD  operations 

Whilst  the  FHE  scheme  works  for  any  polynomial  F  with  small  coefficients,  the  common  case,  as  in  [12]  and  [23],  is 
to  use  the  polynomial  F{X)  :=  X^  +  1.  As  pointed  out  by  Gentry  and  Halevi  [12]  this  enables  major  improvements 
in  the  key  generation  procedure  over  that  proposed  by  Smart  and  Vercauteren  [23].  If  we  let  rn  denote  the  roots  of  the 
polynomial  F  over  the  complex  numbers,  or  over  a  sufficiently  large  finite  field,  then  we  can  compute  C,  and  d  as  follows: 

-  Compute  u!i  ^  7(7*)  £  C  for  all  i. 

-  Compute  d  ^  11 

-  Computer*  ^  l/wj- 

-  Interpolate  the  polynomial  ( /d  from  the  data  values  uj* . 

The  key  observation  is  that  since  F{X)  is  of  the  form  X^  +  1,  the  rji  are  2""*“^-^  roots  of  unity  and  so  to  perform 
the  polynomial  evaluation  and  interpolation  above  we  can  apply  the  Fast  Fourier  Transform  (FFT).  Indeed,  Gentry  and 
Halevi  present  an  even  more  optimized  scheme  to  compute  d  and  1^  which  requires  only  polynomial  arithmetic,  but  this 
makes  significant  use  of  the  fact  that  the  trace  of  2-power  roots  of  unity  is  always  zero. 

The  problem  with  selecting  F{X)  =  X^  +  I  is  that  it  has  only  one  irreducible  factor  modulo  two.  In  particular  if 
we  select  F{X)  =  X^  +  I  then  the  underlying  plaintext  algebra  is  given  by 

A  :=  ¥2[X]/iF)  ^  ¥2[X]/{X  -  if". 

In  other  words,  F  does  not  split  into  a  set  of  distinct  irreducible  factors  modulo  two  as  we  required  to  enable  SIMD 
operations. 

We  now  present  a  possible  replacement  for  F{X).  The  key  observation  is  that  we  need  an  F{X)  which  enables 
fast  key  generation  via  FFT  like  algorithms,  which  has  small  coefficients,  and  which  splits  into  distinct  irreducible 
factors  modulo  two  of  the  same  degree.  In  addition  we  need  a  relatively  large  supply  of  such  polynomials  to  cope  with 
increasing  security  levels  (i.e.  N),  different  numbers  of  parallel  operations  (i.e.  1)  and  different  degree  two  finite  fields 
in  which  operations  occur  (i.e.  n).  In  particular  need  to  pick  an  F{X)  which  generates  a  Galois  extension  of  degree  n. 
In  addition  we  need  to  select  a  polynomial  F{X)  such  that  2  is  neither  ramified,  nor  an  index  divisor,  in  the  associated 
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number  field  generated  by  a  root  of  F{X).  These  conditions  ensure  that  the  algebra  mod  two  splits  into  distinct  finite 
fields  of  the  same  degree. 

One  is  then  led  to  consider  other  cyclotomic  polynomials  as  follows.  We  select  an  odd  integer  m  and  recall  that  the 
m-th  cyclotomic  polynomial  is  defined  by 

Fm{X)  :=n(X-r7) 

V 

where  77  ranges  over  all  m-th  primitive  roots  of  unity.  We  have  deg(<Pm(-^))  =  0(m),  and  that  <Prn{X)  is  an  irreducible 
polynomial  with  integer  coefficients.  In  the  practical  range  for  m,  the  coefficients  of  <Pm  are  very  small,  e.g.  for  all 
m  <  40000  the  coefficients  are  bounded  by  59  and  are  in  most  cases  much  smaller  than  this  upper  bound. 

The  field  Q{6)  is  a  Galois  extension  and  hence  each  prime  ideal  splits  in  Q{0)  into  a  product  of  prime  ideals  of  the 
same  degree  and  ramification  index.  If  m  is  odd  then  the  prime  two  does  not  ramify  in  the  field  Q(^),  nor  is  it  an  index 
divisor.  In  particular,  by  Dedekind’s  criterion,  this  means  that  the  polynomial  <Prn{X),  of  degree  N  =  4>{m),  factors 
modulo  two  into  a  product  of  r  =  N/d  distinct  irreducible  polynomials  of  degree  equal  to  the  unique  degree  d  of  the 
prime  ideals  lying  above  the  ideal  (2).  This  degree  d  is  the  smallest  integer  such  that  2^^  =  1  (mod  m). 

Hence,  by  selecting  F{X)  :=  <Prn{X)  in  our  construction  of  the  algebra  A  over  F2,  we  find  that  A  is  isomorphic  to 
a  product  of  r  finite  fields  of  degree  d  =  N/r.  The  only  issue  is  whether  one  can  perform  the  key  generation  efficiently. 
To  do  this  we  use  Fourier  Transforms  with  respect  to  the  m-th  roots  of  unity.  In  particular  given  the  polynomial  7  in  the 
key  generation  procedure  we  compute  the  evaluation  at  the  m-th  roots  of  unity  via  a  Fourier  Transform,  and  produce 
the  norm  d  by  selecting  the  N  required  values  to  multiply  together  (consisting  of  the  evaluations  of  the  primitive  roots 
of  unity).  One  can  then  compute  I/7  by  inverting  the  Fourier  coefficients  and  then  interpolating  via  the  inverse  Fourier 
Transform. 

In  other  words  the  same  optimization  as  mentioned  earlier  can  be  applied:  Instead  of  taking  the  standard  Cooley- 
Tukey  [7]  FFT  method  for  powers  of  two,  we  apply  the  Good-Thomas  method  [17,25]  for  when  m  is  a  product  of  two 
coprime  integers,  or  Cooley-Tukey  when  m  is  a  prime  power.  Either  method  reduces  the  problem  to  computing  FFTs 
for  prime  power  values  of  m,  for  which  we  can  use  the  Rader  FFT  algorithm  [21].  This  in  itself  reduces  the  problem  to 
computing  a  convolution  of  two  sequences,  which  is  then  performed  by  extension  of  the  sequences  to  length  a  power 
of  two  followed  by  the  application  of  the  Cooley-Tukey  algorithm  to  the  extended  sequence.  Overall  the  FFT  then 
takes  0{m  ■  logm)  operations  on  elements  of  size  0(log2  d)  bits.  In  practice  m  ^  2  ■  N  and  so  this  gives  the  same 
complexity  for  key  generation  as  using  F{X)  =  X'^  -\-  1,  however  the  implied  constants  are  slightly  greater.  This 
means  we  can  achieve  almost  the  same  complexity  for  key  generation  as  in  the  2-power  root  of  unity  case.  In  [22]  the 
above  approach  is  extended  and  further  optimizations  are  applied,  so  as  to  reduce  the  cost  to  nearer  to  what  one  sees 
when  using  F{X)  =  +  1. 


4  Fully  Homomorphic  Scheme  and  Naive  Recryption  Method 

To  turn  the  somewhat  homomorphic  scheme  of  the  previous  section  into  a  fully  homomorphic  scheme,  we  follow 
Gentry’s  bootstrapping  approach,  i.e.  we  squash  the  decryption  circuit  so  much  that  it  can  be  evaluated  by  the  somewhat 
homomorphic  scheme.  In  particular,  we  use  the  optimized  procedure  described  by  Gentry  and  Halevi  in  [12]. 


4. 1  The  Recryption  Method  of  Gentry  and  Halevi 

Recall  that  each  message  bit  can  be  recovered  as  =  [c-Wi-  Co]d  (mod  2)  with  the  Wi  being  publicly  computable 
constants  defined  in  (6).  Since  [c  •  Wi\d  can  be  computed  without  knowledge  of  C(o  it  suffices  to  show  how  [c  •  Cold 
(mod  2)  can  be  computed  with  a  low  complexity  circuit. 

The  idea  is  to  write  the  private  key  ^0  as  the  solution  to  a  sparse-subset-sum  problem.  In  particular,  we  will  define  s 
sets  of  S  elements  as  follows  (a  discussion  on  the  sizes  of  s  and  S  will  be  given  later):  choose  s  elements  Xi  €  [0, . . . ,  d), 
a  random  integer  7?  e  [1, . . . ,  d)  and  define  the  i-th  set  Bi  =  {xi  ■  W  (mod  d)  \  j  e  [0, . . . ,  S')}  such  that  the  private 
key  ^0  can  be  written  as  the  sum 

S  5'—  1 

Co  =  ^*7  ■  Xi  ■  (mod  d) , 

i=l  j=o 

where  for  each  i  only  one  bi  j  =  1  and  all  other  bi  j  are  zero.  The  index  j  for  which  bi  j  =  1  will  be  denoted  by  and 
so  we  can  write  (}o  =  (mod  d).  The  result  is  that  we  have  written  (0  as  the  sum  of  s  elements,  where 

one  element  is  taken  from  each  Bi .  To  enable  recryption  or  ciphertext  cleaning,  we  will  augment  the  public  key  with 
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additional  information:  compute  the  ciphertexts  <—  Encrypt(foi  pk)  for  1  <  i  <  s,  0  <  j  <  S,  then  the  public  key 
now  consists  of  the  data 

cr,  s,  tS*,  -/?,  } j=o  ^  • 

Denote  yi  j  =  c  -  Xi  ■  W  (mod  d)  for  i  =  1, . . . ,  s  and  j  =  0, . . . ,  S'  —  1  such  that  0  <  yij  <  d,  then  the  decryption 
function  [c  •  Cold  (mod  2)  can  be  rewritten  as 


[c-Co]d  (mod  2)  = 


s  S—1 

^i,3  ■  Vi, 3 

i=l  j=0 


(mod  2) 


\i=l j=0  / 

s  5'— 1 

=  00^.  •  Vij  (mod  2)  ( 

i=l  j=0 


s  S—1 

2=1  j _ 0 


s  S—  1 

E  E 

1=1  j=0 


y^,3 

d 


(mod  2) 

(mod  2)  . 


Note  that  the  latter  double  sum  T  =  equal  to  c-(o  jd  and  if  we  assume  that  c  is  the  image  of  C(ff) 

under  H,  where  ||C(d)||oo  <  ?7/(s  +  1),  then  we  know  by  Corollary  1  that  T  is  within  distance  l/2(s  +  1)  of  an  integer. 
If  we  now  replace  each  with  an  approximation  Zi^j  up  to  p  bits  after  the  binary  point,  i.e.  \zij  —  yij/d\  < 
then  since  there  are  only  s  non-zero  terms,  we  have  that  |T  —  j'  I  <  '® '  2“*-^+^^ .  Rounding  the  double 

sum  over  the  Zij  will  thus  give  the  same  result  as  rounding  T  as  long  as 


1 

2(s-bl) 


-hs-  2"(P+^) 


<  1/2, 


which  implies  that  p  >  (log2(s  +  1)1-  Furthermore,  in  the  inner  sum  we  are  adding  S  numbers  of  which  only  one 
is  non-zero.  As  such,  we  can  compute  the  fc-th  bit  of  this  sum  by  simply  XOR-ing  the  fc-th  bits  of  the  bij  ■  Zij  for 
j  =  1, . . . ,  S'.  We  are  then  left  with  an  addition  of  s  numbers,  each  which  consists  of  p  bits  after  the  binary  point. 

We  are  now  ready  to  formulate  the  recrypt  algorithm  by  mapping  these  equations  into  the  encrypted  domain.  To  this 
end,  we  require  two  helper  functions.  The  first  function  b  <—  compute_bits(y)  takes  as  input  an  integer  0  <  y  <  d  and 
outputs  the  vector  of  bits  b  =  (6o,  &i, . . . ,  6j,)  such  that 


y 

d 


+  •••  + 


2P^ 


1 

2P+1  • 


This  is  easily  computed  by  determining  u  ^  ((2^  •  i/)/dJ ,  and  then  reading  the  bits  from  the  (small)  integer  u. 

The  second  function  schooLbook_add(A)  takes  as  input  an  s  x  (p  -|-  1)  array  A  of  ciphertexts,  where  each  row 
contains  the  encryptions  of  the  (p  +  1)  bits  of  an  integer.  The  result  of  the  function  is  a  (p  +  1)  vector  containing  the 
encryptions  of  the  (p  -|-  1)  bits  of  the  sum  of  these  s  integers  modulo  2^+^.  The  school  book  method  is  discussed  in 
more  detail  in  [12]  where  it  is  shown  that  it  takes  time 


2school_book_add 


P-1 

+  J2i^  +  k)-2P->^ 


•  T 

rr 


od.d 


where  Tmod.d  denotes  the  time  of  performing  one  multiplication  modulo  d. 

In  Algorithm  1  we  present  the  algorithm  for  recrypting  the  first  bit  of  the  message  underlying  a  ciphertext  c,  i.e.  the 
algorithm  computes  [c  •  Co]d  (mod  2)  in  the  encrypted  domain  using  the  augmented  public  key.  This  is  essentially  the 
recryption  algorithm  used  by  Gentry  and  Halevi,  where  the  message  space  is  one  bit  only. 


4.2  Some  Initial  Modifications 

Before  progressing  to  our  parallel  recryption  method  we  first  pause  to  re-examine  the  Gentry-Halevi  method  in  the  con¬ 
text  of  largest  message  spaces.  To  obtain  the  recyption  of  the  i-th  coefficient  we  simly  input  [c  •  Wi]d  instead  of  c,  since 
decrypting  the  i-th  bit  is  given  by  [c-  Wi  ■  Cold  (mod  2).  We  denote  the  cost  of  executing  this  algorithm  for  a  one  bit  ci¬ 
phertext  as  Tbits-  Ignoring  the  modular  additions,  we  see  that  TbUs  =  (^(S  -t-  1)  •  s  •  -|-s  •  2^“^  -|-  +  k)  ■  2^“^^  • 

Tmod.d- 
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Algorithm  1:  BitRecrypt(c,  pk):  Recrypting  the  First  Bit  of  the  Plaintext  Associated  With  Ciphertext  c 

A^O,  where  A  e  Afsx(p+l)(^d)- 

sum  ■«—  0. 

for  i  from  1  upto  s  do 

y  ^  c  -  Xi  (mod  d). 

for  j  from  0  upto  5  —  1  do 
if  y  is  odd  then 

sum  ■«—  sum  0  Cij. 
b  ^  compute_bits(^). 
for  u  from  0  upto  p  do 

Ai,u  ■<—  ©  (b„  •  Cij  ). 

y  ^  y  ■  R  (mod  d). 
a  ^  schooLbook_add(A). 
c  <—  sum  ©  ao . 
return  (c). 


To  recrypt  a  whole  ciphertext  c,  we  first  form  ciphertexts  =  BitRecrypt([c  •  Wi\y,  pk)  for  i  =  0, . . . ,  A  —  1,  which 
are  recryptions  of  the  coefficients  of  the  underlying  polynomial  M{X)  by  submitting  [c  •  Wi]d  to  Algorithm  1.  Then 
given  Ci  we  form  the  ciphertext 

N-l 

c  ^  ^  Ci  ©  a 
i=0 

which  will  be  a  recryption  of  the  original  ciphertext.  Note,  to  control  the  noise  this  last  sum  is  computed  naively,  and  not 
via  Horner’s  rule,  i.e.  we  multiply  each  coefficient  ciphertext  ct  by  a*  (mod  d)  and  then  sum.  The  resulting  algorithm  is 
summarized  in  Algorithm  2.  Assuming  the  a*  (mod  d)  and  wt  are  precomputed,  the  total  cost  of  recrypting  a  ciphertext 


Algorithm  2:  Recrypting  Ciphertext  c  version  1 
0. 

for  i  from  0  upto  A  —  1  do 

Ci  ^  BitRecrypt([c  •  Wi]d,  pk). 
c  ^  c  ©  Ci  ©  a' . 

return  (c). 


corresponding  to  an  arbitrary  element  in  A  (using  our  naive  method)  is  essentially  N  ■  TbUs  +  2  •  A  •  Tmod.d-  If  SIMD 
style  operations,  and  operations  on  larger  datatypes,  are  to  be  supported  we  therefore  need  a  more  efficient  method  to 
perform  recryption;  since  the  above  cost  could  be  prohibitive.  We  therefore  now  turn  to  utilizing  our  SIMD  operations 
to  improve  the  performance  of  recryption  of  such  ciphertexts. 


5  Parallel  Recryption 

Whilst  Algorithm  1  will  recrypt  a  ciphertext  that  encodes  an  element  of  the  algebra  A,  it  can  be  made  significantly  more 
efficient.  Firstly,  the  procedure  recrypts  a  general  element  in  A,  yet  in  practice  we  will  only  have  that  c  contains  l-n  <  N 
encrypted  bits.  Secondly,  since  the  recrypt  procedure  is  a  binary  circuit  we  can  run  it  on  the  r  embedded  copies  of  F2, 
i.e.  we  can  use  the  SIMD  style  operations  to  recrypt  r  bits  in  parallel. 

The  first  optimization  is  easy  to  obtain:  recall  that  maps  a  vector  of  I  binary  polynomials  (ki(-!/)),  . . . ,  «;(■!/')) 
each  of  of  degree  less  than  n,  into  a  polynomial  a{X)  of  degree  less  than  A.  The  map  F^y  thus  defines  an  isomorphism 
between  ICjj  and  F^y  (Kn)  so  F~j  is  well  defined  on  the  result  of  the  computation.  We  can  represent  F~j  explicitly  by 
an  (n  •  Z)  X  A  binary  matrix  B  over  F2  which  is  defined  as  follows: 

N-l 

coeff{Ki,j)  =  ^  •  coeff(a(A),fc). 

fc=o 

Using  B  we  can  therefore  first  obtain  encryptions  of  all  the  coefficients  of  the  Ki,  recrypt  these  using  Algorithm  1  and 
then  reconstruct  the  recrypted  ciphertext  using  F^y-  In  particular,  denote  with  Ci-^d2  ^  recryption  of  the  iith  coefficient 
of  the  i2th  component  in  kJj,  then  we  can  obtain  a  full  recryption  of  an  element  in  kJj  by  computing 

n—  1  I 

2l=0  22  =  1 
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where  (0, . . . ,  0,  0, . . . ,  0)  €  ICjj  is  the  element  whose  i2th  component  is  equal  to  ,  and  M{X)\^  is  the  trivial 

encryption  of  the  element  M{X)  in  the  algebra  A. 

Recall  that  given  a  ciphertext  c,  the  value  [c  •  Wi]d  is  an  encryption  of  the  ith  coefficient  of  a(X).  Since  the  scheme 
is  homomorphic  and  using  the  matrix  B  we  conclude  that 


-N-l 

-  /N-l  y 

^ii,i2  ~ 

'y  ^  ^i-i_-\-i2-n-\-l,k-\-l[^  '  '^k\d 

= 

^  ^  ^  ^i-i-\-i2-n-\-l,k-\-l  '  '^k  J 

.k=0 

d 

\k=0  /- 

is  a  valid  encryption  of  coeff(fvi2,ii).  Note  that  these  quantities  are  obtained  as  the  sum  of  maximum  N  ciphertexts, 
which  implies  that  the  original  c  has  to  be  an  encryption  of  C{6)  with  |jC'(6')||oo  <  U/{{s  +  1)  ■  N)  for  Algorithm  1  to 
recrypt  correctly.  The  second  algorithm  thus  first  computes  the  n  •  I  constants  (the  Wi  are  no  longer  required) 


N-l 

^ii+i2-n+l.k+i  ■  '^k  (mod  d) , 

k=0 

and  then  computes  the  recryptions  Ci-^d2  ~  BitRecrypt([c-Uij  ^2]^)  pk)-  Notice  how  we  have  reduced  the  number  of  calls 
to  recrypt  from  N  down  ton  - 1  and  that  we  require  only  n  •  I  constants  instead  of  the  N  constants  Wi.  The  result  is 

summarized  in  Algorithm  3.  Assuming  the  ;(0,  •  •  • ,  0,  0, . . . ,  0)^  and  are  precomputed,  the  total  cost 

of  recrypting  a  ciphertext  is  essentially  n  ■  I  ■  +  2  •  n  •  I  ■  'Tmod.d- 


Algorithm  3:  Recrypting  Ciphertext  c  version  2 

c  ^  0. 

for  *1  from  0  upto  n  —  1  do 

for  12  from  0  upto  Z  —  1  do 

Cii.ia  ^  BitRecrypt([c- 

,i2]d.Pk)- 

c  ^  c©  0  {r„d{o,  ■ . 

.,0,V''1,0,...,0))  1^. 

return  (c). 

So  far  we  have  not  exploited  the  SIMD  capabilities  of  the  somewhat  homomorphic  scheme.  Therefore  our  next  goal 
is  to  produce  the  recryptions  Ci-^d2  m  parallel  for  12  =  1, . . . ,  L  Thus  we  aim  to  compute  a  ciphertext  from  c  such 
that  Cij  represents  a  recryption  of  the  message 


(coeff(fvi,  ii), . . . ,  coeff(K/,  ii))  , 


where  c  represents  an  encryption  of  (ki,  . . . ,  k;).  We  use  the  notation  Ci  to  distinguish  it  from  the  recryption  Ci  above. 

The  key  observation  is  that  the  recrypt  procedure  is  the  evaluation  of  a  binary  circuit,  and  that  this  binary  circuit  is 
identical  (bar  the  constants)  no  matter  which  component  we  are  recrypting.  In  addition  the  algebra  splits  into  (at  least)  I 
finite  fields  of  characteristic  two,  thus  we  can  embed  the  binary  circuit  into  each  of  these  I  components  and  perform  the 
associated  recryption  in  parallel.  For  a  fixed  ii  we  therefore  want  to  execute  the  computation  of  the  vector 


([c- Wii,i  •  Co]d  (mod  2), . . . ,  [c- •  Co]d  (mod  2)) 


in  the  encrypted  domain  in  parallel.  Recall  that  each  component  of  this  vector  is  computed  as 


S  S—1 


[c- Vii.fc  •  Cold  (mod  2)  =  (mod  2)  ( 


i=l  j=o 


s  S—1 

EE''* 

i=l  j=0 


Jk) 


(mod  2) , 


where  snAz\^^  an  approximation  of  up  top  bits  after  the  binary  point.  Recall  that  to  obtain 

the  bit  Bk  =  XllEo^  \  (™°‘i  2)  we  used  the  function  schooLbook_add(M)  with  input  an  s  x  (p  +  1) 

array  M  where  the  ith  row  contained  ■  compute_bits(p'^-^).  In  fact,  B^  was  simply  the  first  bit  in  the  bit  vector 

returned  by  schooLbook_add(M). 

If  we  now  want  to  execute  the  above  computation  in  the  feth  component  (instead  of  the  first),  we  basically  have 
to  multiply  everything  by  TCi  ;(0, . . . ,  0, 1,  0, . . . ,  0),  where  (0, . . . ,  0, 1,  0, . . . ,  0)  is  the  vector  of  I  elements  of  Kn 
whose  fcth  element  is  equal  to  one,  with  all  other  elements  being  zero.  To  avoid  costly  modular  multiplications  by 
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rn,i{0, . . . ,  0, 1,  0, . . . ,  0)|^,  we  will  use  I  different  encryptions  of  bij,  depending  on  which  of  the  I  components  of  the 
algebra  we  are  using.  In  particular,  we  no  longer  augment  the  public  key  with  the  data 

s,  5*,  i?, {ci  j  }^_Q  ^  , 

where  Cij  Encrypt(6i  j,  pk),  but  instead  replace  the  Cij  components  with  elements  Cj  j  where 

^i,j,k  ^  Encrypt  [bij  ■  ,  0, 1,  0, . . . ,  0),  pk)  for  1  <  i  <  s,  0  <  j  <  S',  0  <  k  <  I . 

This  means  we  need  to  increase  the  size  of  the  augmented  public  key  by  essentially  a  factor  of  1.  Once  we  have  computed 
all  the  Cjj ’s  we  can  simply  recover  c  by  computing 

n—  1 

ii=0 

The  resulting  algorithm  is  given  in  Algorithm  4.  Note  that  to  compute  each  we  only  require  one  call  to  the  function 
schooLbook_add(A);  compared  to  I  calls  in  Algorithm  3. 


Algorithm  4:  Recrypting  Ciphertext  c  version  3:  parallel  recryption  of  all  iith  coefficients  of  the  n  elements 
embedded  in  a  ciphertext  c 
c  ^  0. 

for  *1  from  0  upto  n  —  1  do 

sum  ^  0. 

A  <—  0,  where  A  £  (p+i)  i^ldX). 

for  12  from  0  upto  Z  —  1  do 

Cii,i2  ^  c  •  (mod  d). 

for  j  from  1  upto  s  do 

y  ^  ■  Xj  (mod  d). 

for  k  from  0  upto  S  —  1  do 
if  y  is  odd  then 

sum  <—  sum  ©  ej,k,i2  ■ 
b  <—  compute_bits(j/). 
for  u  from  0  upto  p  do 

y  ^  y  •  R  (mod  d). 
a  ^  school_book_add(A). 

Ci^  ^  sum  0  ao. 

C^C©Ci2  0  ((r„,;(V;'i,...,V’'0)  D- 

return  (c). 


We  let  Tpar{n,  1)  denote  the  cost  of  performing  this  recryption  operation  on  a  message  consisting  of  I  field  elements 
from  ]K„  held  in  parallel.  Assuming  the  (Cn  ;  ,  •  •  • ,  ))  |  q,  ^^d  the  are  precomputed  we  obtain  that 

T’par(rt,  /)  TL  (S'  -  S'/Ts-Z©/©!)'  Trriod.d  ©  ^  '  ^schooLbook_add  • 

The  main  cost  advantage  therefore  stems  from  the  fewer  calls  to  the  function  schooLbook_add. 

Naively  it  would  appear  that  our  parallel  version  of  recrypt,  using  Algorithm  4,  is  more  efficient  than  the  naive 
version  using  Algorithm  2.  However,  one  may  need  larger  public  keys  to  actually  implement  the  parallel  recryption  (as 
it  is  a  more  complex  circuit).  We  also  need  to  compare  whether  doing  operations  in  parallel  and  with  large  data  entries 
(via  the  algebra  A)  is  more  efficient  than  doing  the  same  operations  but  with  bits  using  the  standard  bit-wise  FHE  scheme 
but  with  more  complex  circuits.  It  is  to  this  topic  we  now  turn  by  examining  some  “toy”  examples  for  small  security 
parameters: 


6  Security  Analysis  and  Parameters 

The  analysis  of  Gentry  of  the  basic  FHE  scheme  and  associated  bootstrapping  operation  applies  in  our  situation.  The 
security  of  the  underlying  somewhat  homomorphic  scheme  is  based  on  the  hardness  of  a  variant  of  the  bounded  distance 
decoding  (BDDP)  problem;  whereas  the  security  of  the  bootstrapping  procedure  is  based  on  the  sparse  subset  sum 
problem  (SSSP).  Indeed  the  minor  modifications  we  make  in  future  sections  to  the  public  key  result  in  exactly  the  same 
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security  reductions.  Thus  an  adversary  against  the  scheme  can  either  be  turned  into  an  algorithm  to  solve  a  decision 
variant  of  the  BDDP,  or  a  SSSP. 

When  selecting  key  sizes  for  cryptographic  schemes,  in  practice  one  almost  always  selects  key  sizes  based  on  the 
best  known  attacks  and  not  on  the  hard  problems  from  which  a  security  problem  reduces.  We  have  various  parameters 
we  need  to  select  s,  S,  N,  t  and  fi.  The  sizes  of  N,  t  and  /r  determine  whether  one  can  break  the  scheme  by  distinguishing 
ciphertexts,  or  (more  seriously)  by  message  or  key  recovery.  Parameter  selection  is  here  based  on  the  hardness  of  solving 
explicit  closest  vector  problems  (CVPs),  in  lattices  of  dimension  N,  involving  basis  matrices  with  coefficients  bounded 
by  d  (a  function  of  t  and  N),  and  for  close  vectors  whose  distance  to  the  lattice  is  related  to  the  size  of  fi.  An  algorithm 
to  solve  the  CVP/BDDP  can  be  directly  used  to  recover  plaintexts  as  explained  in  [23].  The  larger  the  ratio  of  t  to  /i  the 
easier  it  is  to  recover  plaintexts,  but  the  ratio  of  f  to  /i  also  determines  how  complicated  a  circuit  the  basic  somewhat 
homomorphic  scheme  can  evaluate.  Indeed  the  smaller  the  ratio  of  f  to  /r  the  less  expressive  our  somewhat  homomorphic 
scheme  is.  In  selecting  N,  t  and  /i  one  needs  to  make  a  careful  analysis  of  the  current  state  of  the  art  in  lattice  basis 
reduction;  a  topic  which  is  beyond  the  scope  of  this  paper. 

On  the  other  hand,  it  is  not  the  case  that  an  algorithm  to  solve  the  sparse  subset  sum  problem  can  be  used  to  break 
the  scheme.  The  security  proof  in  [11]  uses  the  FHE  adversary  to  solve  the  following  SSSP 


S  S—1 

Co  =  •  {xi  ■  R^)  (mod  d). 

i=l  i=o 

The  simulator  (solving  SSSP)  is  given  and  the  weights  Xi  ■  (mod  d),  and  uses  random  ciphertexts  Cj  j  to  represent 
the  encryption  of  the  bij.  Since  the  proof  has  already  shown  that  ciphertexts  of  specific  values  are  indistinguishable 
from  encryptions  of  random  values,  the  adversary  does  not  know  it  is  in  a  simulation.  The  proof  in  [11]  shows  how  the 
simulator  can  then  solve  the  SSSP.  Whilst  this  easily  establishes  the  fact  that  the  recrypt  procedure  does  not  reduce  the 
security  of  the  scheme,  assuming  of  course  the  scheme  is  KDM  secure  and  the  SSSP  is  hard,  it  actually  tells  us  very 
little  in  practice.  In  particular  it  says;  “If  the  adversary  knows  the  secret  key,  then  recovering  another  representation  of 
the  secret  key  is  equivalent  to  solving  the  SSSP”. 

The  parameters  s  and  S  determine  (in  practice)  a  hidden  sparse  subset  sum  problem  rather  than  a  standard  SSSP. 
Namely,  the  adversary  needs  to  solve  the  above  subset  sum  problem  where  he  is  not  given  access  to  the  value  ((o-  Taking 
the  pragmatic  view  of  parameter  selection  based  on  the  best  known  attack,  it  is  clear  that  neither  the  lattice  attacks  on 
the  SSSP  nor  the  time-memory  trade  off  methods  to  solve  the  SSSP  apply  in  the  hidden  case.  This  has  important  direct 
implications  for  parameter  size  selection.  For  example,  if  a  time-memory  trade  off  is  possible  then  we  need  to  select 
S  and  s  such  that  >  2^,  where  we  do  not  believe  the  adversary  can  perform  2^  operations.  However,  since  the 

time-memory  trade  off  against  the  hidden  SSSP  appears  impossible,  we  select  can  instead  select  S'®  >2^. 

This  observation  has  a  number  of  consequences:  Firstly  we  can  select  S  to  be  much  smaller  than  Gentry-Halevi  do, 
secondly  this  means  we  do  not  need  to  complicate  the  recryption  procedure  with  the  index  encoding  method  they  use 
to  save  space,  since  S  is  now  small  enough  to  not  require  it.  Thirdly  this  halves  the  degree  of  the  resulting  recryption 
circuit  which  makes  the  scheme  more  efficient,  and  fourthly  it  saves  on  the  computational  cost  of  recryption,  since  we 
need  to  do  less  work. 

In  summary:  in  practice  one  should  select  N,  t  and  /i  according  to  best  practice  from  lattice  basis  reduction.  For  real 
systems  this  means  that  parameters  need  to  be  chosen  that  are  significantly  larger  than  the  toy  examples  presented  in 
Gentry-Halevi.  However,  when  selecting  s  and  S  one  can  be  less  conservative  than  Gentry-Halevi. 

In  Section  5  we  detailed  a  parallel  recryption  procedure  which  has  the  same  multiplicative  depth  as  the  one  above;  but 
which  requires  more  addition  operations,  where  the  number  of  extra  additions  depends  on  the  level  of  SIMD  operations 
required.  Thus  the  value  of  t  may  need  to  be  larger  than  that  required  in  non  SIMD  based  schemes.  Asymptotically  the 
constant  increase  will  make  no  difference,  but  for  “practical”  parameters  one  may  have  a  noticeable  difference.  Thus 
we  now  turn  to  presenting  experimental  results  for  “toy”  security  levels.  This  is  done  purely  to  show  that  our  algorithms 
make  a  difference  even  for  choices  of  N,  fj-  and  t  corresponding  to  low  security  levels. 


7  Experimental  Results 

So  the  question  arises  as  to  whether  it  is  simpler  to  perform  FHE  on  bits,  or  to  perform  FHE  via  the  algebra  A.  In  this 
section  we  concentrate  on  estimating  the  performance  in  terms  of  the  run  time  and  the  sizes  of  the  resulting  ciphertexts 
which  need  to  be  stored.  First  recall  key  generation;  we  choose  N  and  a  polynomial  F{X)  with  small  coefficients,  we 
then  choose  an  element  y  G  Z[6]  which  has  coefficients  of  order  2*.  This  results  in  a  value  for  d  of  size  approximately 
nN  .  2*'^;  thus  we  require  roughly  0{N  ■  (t  -|-  log  N))  bits  to  represent  a  single  ciphertext. 
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We  first  let  T{n)  denote  the  function  which  returns  the  number  of  F2  multiplications  needed  to  perform  a  multipli¬ 
cation  in  the  field  Kn  =  F2" .  Using  Karatsuba  multiplication  (for  example)  we  find,  for  n  a  power  of  two,  that 

'  '  \  3  •  T{nl2)  otherwise. 

This  is  clearly  only  an  estimate  of  the  overall  cost,  as  we  are  ignoring  the  required  additions  and  management  of  the 
data. 

There  are  various  different  options  one  has  for  implementing  operations  on  I'  finite  fields  each  of  size  2"  .  In  the 
following  discussion  we  concentrate  on  the  following  four  options;  clearly  other  options  are  available  but  we  select  these 
as  a  way  of  demonstrating  the  different  ways  how  our  techniques  could  be  used. 

Option  1 ::  We  operate  on  bits  using  the  standard  bit-wise  THE  schemes,  i.e.  we  take  n  =  1  =  1  in  our  FHE  scheme. 
We  will  then  require  l'  ■  n'  ■  t  ■  N  bits  to  store  our  l'  finite  field  elements,  and  the  cost  of  performing  a  single  SIMD  style 
multiplication  on  the  l'  finite  fields  will  cost  around  l'  ■  T(n')  ■  Tbits  multiplications. 

Option  2::  We  operate  on  the  T  finite  field  elements  where  each  element  uses  a  single  ciphertext,  i.e.  we  take  n  =  n' 
and  1  =  1  in  our  EHE  scheme.  This  option  has  the  benefit  that  we  can  work  with  the  finite  field,  but  we  are  not  forced  to 
operate  in  a  SIMD  manner  all  the  time.  With  such  an  option  we  will  require  l'  -t-N  bits  to  store  our  l'  finite  field  elements, 
and  performing  a  single  SIMD  style  multiplication  on  the  l'  finite  fields  will  cost  around  l'  ■  Tpar{n' ,  1)  multiplications. 
Option  3::  We  operate  on  all  l'  finite  fields  in  a  SIMD  fashion  using  only  a  single  ciphertext,  i.e.  we  take  n  =  n'  and 
I  =  l'  in  our  FHE  scheme.  Thus  we  will  require  t  ■  N  to  store  our  l'  finite  field  elements,  and  performing  a  single  SIMD 
style  multiplication  on  the  l'  finite  fields  will  cost  around  Tpar(n^  l')  multiplications. 

Option  4::  Here  we  operate  on  bits,  but  we  operate  on  them  in  a  SIMD  fashion  by  having  a  ciphertext  represent  l'  bits, 
i.e.  we  take  n  =  1  and  I  =  l'  in  our  FHE  scheme.  With  this  option  we  require  n'  ■  t  ■  N  bits  to  store  the  l'  finite  field 
elements,  and  SIMD  style  multiplication  will  require  T{n')  ■  rpar(l,  l')  multiplications. 

We  summarize  the  above  choices,  for  the  concrete  parameters  of  n'  =  8  and  /'  =  16,  in  the  following  table.  We 
select  a  value  for  N  around  the  size  of  2000,  purely  to  enable  comparison  with  the  work  of  [12].  We  iterate  this  value 
is  purely  for  illustrative  purposes  to  show  the  difference  between  the  various  options;  it  should  not  be  taken  to  indicate 
the  N  «  2000  is  a  secure  security  level.  Fixing  n' ,  l'  and  N  rather  than  leaving  them  variable  is  done  as  the  overhead  of 
the  SIMD  operations  crucially  depends  on  the  specific  combination  of  finite  field  and  cyclotomic  field  chosen,  and  has 
no  nice  asymptotic  meaning.  We  select  a  single  parameter  instance  simply  not  to  overwhelm  the  reader  with  data,  since 
our  goal  is  purely  to  show  feasibility  of  our  algorithms  even  at  low  security  levels. 

Note,  that  for  Option  1  we  select  N  =  2048  since  if  we  are  only  encrypting  bits  then  using  the  polynomial  F{X)  = 
-\-  1  will  always  be  more  efficient  than  using  F(X)  =  $3ASb{X).  In  addition  we  keep  the  parameter  t  as  an 
indeterminate,  as  we  will  be  returning  to  that  later. 


N 

Ciphertext 
Space  («  bits) 

Runtime 
Approx  Cost 

Option  1 

2048 

262144 • t 

432  •  Tbit3 

Option  2 

2560 

40960  •  t 

16-Tp3r(8,l) 

Option  3 

2560 

2560  •  t 

Tpar(8,16) 

Option  4 

2560 

20480  •  t 

27-Tpar(l,16) 

Thus  if  one  is  soley  interested  in  reducing  the  memory  of  the  calculation  one  would  select  Option  3.  To  determine 
which  one  is  most  efficient  one  needs  to  actually  implement  the  schemes,  since  the  actual  costs  of  each  operation 
depend  on  the  value  of  t  needed.  So  we  implemented  the  above  algorithms  for  the  four  cases  (W,  n,  1)  =  (2048, 1, 1), 
(2560, 8, 1),  (2560,  8, 16)  and  (2560, 1, 16),  so  as  to  comparre  the  four  options  in  the  above  analysis. 

In  all  cases  we  found  that  taking  t  =  400  resulted  in  a  scheme  in  which  we  were  able  to  recrypt  clean  ciphertexts; 
however  to  enable  fully  homomorphic  encryptions  we  need  to  recrypt  dirty  ciphertexts,  and  be  able  to  perform  some 
additional  operations.  For  the  first  two  of  our  four  cases  we  found  that  t  =  600  was  sufficient,  whilst  for  the  second 
two  we  found  that  t  =  800  was  sufficient;  note,  we  increased  t  in  multiples  of  100,  thus  smaller  values  could  have  been 
sufficient. 

In  the  four  cases  we  found  the  following  recrypt  times.  We  also  present,  assuming  we  wished  in  all  cases  to  imple¬ 
ment  operations  on  l'  =  16  values  in  F2„',  where  n!  =  8,  the  actual  time  needed  to  perform  a  multiplication  in  F2S 
followed  by  a  full  recrypt,  and  the  total  size  of  all  ciphertexts  needed  to  represent  such  data.  In  our  implementation  of 
the  field  algorithms  for  Option  1  and  Option  4  we  used  the  Karatsuba  method  mentioned  above,  and  only  performed  re- 
cryption  when  implementing  a  multiplication  using  the  FHE  scheme;  i.e.  recryption  was  not  performed  upon  additions. 
The  algorithms  were  implemented  in  Ch-+  using  the  NTL  library  and  were  run  on  a  machine  with  six  Intel  Xeon  2.4 
GHz  processors  and  48  GB  of  RAM. 
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Basic  EHE  Scheme 

Performing  Ops  Eor  (n',  /') 

=  (8,16) 

Recrypt 

Mult  &  Recrypt 

Ciphertext 

{N,n,l) 

t 

ip,s) 

Time  (sec) 

Method 

Time  (sec) 

Size 

(2048,1,1) 

600 

(4,32) 

15 

Option  1 

7148 

18.00MB 

(2560,8, 1) 

600 

(4,32) 

187 

Option  2 

2983 

3.00MB 

(2560,8,16) 

800 

(4,32) 

723 

Option  3 

735 

0.25MB 

(2560,1,16) 

800 

(4,32) 

89 

Option  4 

2406 

2.00MB 

The  large  t  value  is  needed  in  the  last  two  examples  due  to  the  increased  complexity  of  the  underlying  recryption 
circuit.  We  end  by  noting  the  following:  In  our  toy  example  we  see  that  SIMD  operations  and  parallel  recryption  offer 
some  performance  advantages.  The  exact  benefit  depends  on  a  number  of  factors.  Firstly  the  size  of  nf  and  l';  these  are 
determined  by  an  application  and  are  often  small.  In  turn  n'  and  l'  affect  the  choice  of  N,  which  also  depends  on  the 
desired  security  level.  The  precise  values  of  t  and  /i  allowed  are  then  determined  by  security  analysis  of  lattice  problems. 
Our  toy  experiments  show  that  our  ability  to  perform  SIMD  operations  do  not  affect  the  size  of  t  very  much  and  that  the 
parallel  recryption  operation  is  as  practical  as  standard  recryption. 

The  exact  choice  of  which  Option  is  best  however  depends  on  an  application.  Just  as  in  standard  SIMD  vs  non- 
SIMD  operations  on  a  standard  processor,  whether  one  utilizes  the  SIMD  instructions  in  a  program  depends  on  the 
precise  program  being  run. 


8  Possible  Applications 

Before  discussing  two  possible  applications  we  note  that  one  issue  with  SIMD  operations  on  data  is  that  sometimes  we 
wish  to  move  data  between  various  elements  in  the  I  values  on  which  we  are  operating.  This  is  often  a  problem,  since  the 
hardware/mathematics/software  which  supports  the  SIMD  operations  precludes  such  operations.  However,  in  our  FHE 
scheme  such  operations  can  be  performed  at  no  additional  cost. 

Indeed  given  a  SIMD  word  consisting  of  I  elements  in  a  finite  field  F2"  one  can  produce  a  new  SIMD  word  which 
consists  of  any  linear  function  of  the  bits  creating  the  original  SIMD  word.  To  see  this  we  notice  that  it  simply  requires 
multiplying  the  matrix  B  used  in  the  parallel  recrypt  procedure  by  the  matrix  defining  the  linear  map.  Thus,  we  can 
perform  this  linear  function  as  part  of  the  recryption  performed  for  the  previous  operation.  In  particular  this  means  we  can 
shuffle  the  elements  in  our  SIMD  word,  or  extract  specific  elements,  or  extract  specific  bits,  etc.  Indeed  extracting  specific 
bits  in  parallel  was  the  core  of  our  parallel  recrypt  procedure  explained  above.  Note,  that  this  ability  to  shift  around 
elements  and  extract  elements  from  a  SIMD  word  is  done  during  the  recryption  procedure;  in  the  BGV  style  schemes 
these  operations  can  be  accomplished  algebraically  on  the  SIMD  word  via  the  use  of  the  homomorphic  application  of 
Galois  automorphisms,  see  [13]  for  further  details. 

We  now  turn  to  our  two  examples:  The  first  example,  namely  homomorphic  evaluation  of  AES  under  some  homo¬ 
morphic  key,  is  used  to  demonstrate  how  SIMD  operations  in  high  level  (F28)  algebraic  structures,  allow  us  to  evaluate 
complex  operations  relatively  easily.  Evaluation  of  AES  circuits  using  FHE  operations  has  been  mentioned  as  a  possible 
usage  scenario  in  [19].  The  second  example,  one  of  database  lookup,  provides  an  example  of  how  data  can  be  searched 
using  SIMD  style  operations  more  efficiently  than  using  the  bit-wise  homomorphic  operations  envisaged  in  [11]. 

In  this  section  we  assume  that  all  operations  are  performed  with  post-processing  by  the  recryption  operation.  Thus  we 
are  no  longer  interested  in  the  size  of  the  circuit  which  implements  a  functionality  but  simply  the  cost  of  the  operations 
involved.  As  explained  above  we  have  essentially  three  key  operations;  the  two  algebraic  operations  Mult  and  Add, 
plus  the  linear  operations  on  bits  mentioned  above.  We  shall  denote  the  cost  of  these  three  operations  by  Cm,  Ca 
and  Cl,  and  we  note  that  Cl  essentially  comes  for  free  as  part  of  recryption.  Eor  example,  if  an  operation  requires  two 
multiplications,  one  addition  and  three  linear  operations  we  shall  denote  this  cost  (for  simplicity)  by  2  •  Cm + Ca + 3  •  Cl  ■ 


8.1  Bit-Slicing 

Any  algorithm  which  is  run  on  a  circuit  using  bit  operations  can  be  run  multiple  times  at  once,  by  executing  the  algorithm 
on  a  set  of  parameters  which  supports  operations  on  multiple  bits  in  parallel.  Such  a  technique  is  often  called  bit-slicing 
when  applied  to  a  single  algorithm;  however  the  technique  is  essentially  also  a  bit-wise  form  of  SIMD  operation.  Hence, 
any  application  performed  using  an  EHE  algorithm  which  supports  the  parallel  recrypt  procedure  in  this  paper  could  be 
potentially  sped-up  by  at  least  an  order  of  magnitude  by  operating  on  multiple  versions  of  the  same  algorithm  in  parallel. 
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8.2  Application  to  AES 


As  an  example  of  the  benefits  of  using  SIMD  enabled  FHE  scheme,  over  the  traditional  bitwise  FHE,  we  examine  the 
case  of  how  one  would  implement  an  AES  functionality  using  FHE.  Namely,  we  want  a  server  to  encrypt  a  message 
using  a  key  which  is  only  available  via  an  FHE  encryption.  Using  AES  as  a  relatively  complex  example  application 
of  secure  computation  has  also  been  recently  suggested  for  a  number  of  other  related  technologies;  namely  two  and 
multi-party  MFC  [8,20].  It  is  also  particularly  well  suited  to  SIMD  execution  due  to  its  overall  design.  Indeed  in  [15] 
the  authors  extend  the  ideas  of  this  section  to  the  BGV  system;  and  present  actual  running  times  for  a  fully  homomorphic 
evaluation  of  the  AES  circuit. 

The  method  we  propose  is  to  encode  the  entire  AES  state  matrix  in  a  single  ciphertext.  Recall  that  the  state  matrix 
is  a  4-by-4  matrix  of  elements  in  F28.  We  therefore  first  need  to  select  an  m  so  that  the  ideal  (2)  splits  into  at  least  16 
prime  ideals  of  degree  divisible  by  eight  in  the  field  defined  by  $rn  {X ) .  There  are  a  large  number  of  such  examples, 
including  the  example  we  have  used  in  this  paper  of  taking  m  =  3485.  Note  that  since  is  equal  to  4  x  16  we  could 
also  perform  4  AES  computations  in  parallel  as  well,  although  we  will  restrict  ourselves  to  one  for  ease  of  exposition. 
In  terms  of  our  previous  section  we  let  Kg,  =  F2S  denote  the  standard  representation  of  F28,  i.e. 

Kg  :=F2[X]/(X®+X^  +  x3-hX-hl), 

and  we  let  A  denote  the  algebra  consisting  of  64  copies  of  F940 ,  each  with  the  representation  induced  by  the  given  factor 
oi<l>m,{X)  (mod  2). 

We  assume  the  AES  state  matrix  is  given  by 

/  So.o  50,1  So, 2  So, 3 
Sl,0  Sl,l  Si, 2  Si, 3 
S2,0  S2,l  S2,2  S2,3 
\S3,0  S3,l  S3, 2  S3, 3 

which  we  encode  as  an  element  of  TTg®  as  (so,o,  so,i,  ■  ■  ■ ,  S3, 3).  Using  the  map  we  obtain  an  element  of  A,  which 
can  then  be  evaluated  at  a  modulo  p  to  obtain  a  trivial  encryption  of  the  message  state  (before  the  first  round). 

To  implement  AES  we  assume  that  the  round  keys  have  been  presented  in  encrypted  form,  using  the  above 
embedding  via  Js^ie.  Computing  the  round  keys  from  a  given  key  can  be  done  using  the  same  operations  needed 
to  execute  the  rounds.  Thus  if  we  can  implement  the  rounds  using  efficient  Fully  Homomorphic  SIMD  (FH-SIMD) 
operations,  then  we  can  also  compute  the  encryptions  of  the  round  keys  given  the  initial  key. 

The  round  structure  of  AES  is  made  up  of  four  basic  operations,  which  we  now  discuss  in  turn. 


8.2.1  AddRoundKey 

This  is  the  simplest  operation  and  is  clearly  performed  for  all  sixteen  bytes  in  parallel  by  doing  a  single  ©  operation  of 
the  FHE  scheme.  This  step  can  be  done  at  the  cost  of  Ca- 


8.2.2  ShiftRows 

In  this  operation  row  i  is  shifted  left  by  i  —  1  positions.  This  is  clearly  an  example  of  a  linear  operation  from  earlier,  in 
that  we  map  the  ciphertext  corresponding  to 

(•S0,0,  So,!,  So, 2,  So, 3,  Sl,o,  Sl,l,  Si, 2,  Si, 3,  S2,0,  S2,l,  S2,2,  S2,3,  S3,0,  S3,l,  S3, 2,  S3, 3) 

into  a  ciphertext  corresponding  to 

(so,0,  So,l,  So, 2,  So, 3,  Sl,l,  Si, 2,  Si, 3,  Sl,o,  S2,2,  S2,3,  S2,0,  S2,l,  S3, 3,  S3,0,  S3,l,  S3, 2). 

Since  this  is  a  reordering  the  cost  is  given  by 
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8.2.3  MixColumns 


In  this  step  we  perform  a  matrix  multiplication  on  the  left  of  the  state  matrix  by  a  fixed  matrix  given  by 

/  X  X  +  l  1  1  \ 

1  X  X  +  l  1 

1  1  X  x  +  l  ' 

Vx  +  1  1  1  X  ) 

This  is  accomplished  in  four  stages 

1.  Compute  the  trivial  encryption  ci  of  rs.iQ{{X,  X,. . . ,  X)),  clearly  this  can  be  precomputed. 

2.  Compute  C2  c®  cy. 

3.  By  application  of  three  linear  operations  we  can  create  ciphertexts  03,04,  05  and  cq  corresponding  to  02  shifted  up 
by  one  row,  o  shifted  up  by  one  row,  c  shifted  up  by  two  rows,  and  c  shifted  up  by  four  rows  (where  shift  rows  is 
performed  with  rotation). 

4.  Compute  02  ©  03  ©  04  ©  05  ©  cq  and  output  the  result. 

Notice  that  our  SIMD  operations  allows  us  to  perform  the  16  multiplications  in  parallel  in  the  second  step.  The  cost  of 
the  MixColumns  operation  is  then  Cm  +  4  •  Ca  +  4  •  C^. 


8.2.4  SubBytes 

This  is  the  most  complex  of  all  the  AES  operations,  however  there  is  much  existing  literature  on  straight  line  (i.e.  no 
branching)  executions  of  the  AES  S-Boxes  at  byte  level.  Eor  example  the  approach  in  [4]  transforms  the  polynomial 
bases  into  a  “nice”  normal  basis  and  then  decomposes  the  arithmetic  for  inversion  into  F24  and  then  F22  operations. 
At  which  point  all  the  arithmetic  is  just  logical  operations,  and  hence  amenable  to  FH-SIMD  operations.  However,  this 
approach  is  more  suited  to  real  hardware,  or  to  FH-SIMD  operations  where  the  basic  data  type  is  a  bit  (e.g.  when  using 
say  (n,  1)  =  (1, 16)  in  our  main  scheme). 

As  we  are  restricted  to  operations  which  can  be  performed  efficiently  in  our  scheme  a  more  naive  approach  is 
probably  to  be  preferred.  Recall  that  the  AES  S-Box  consists  of  inverting  each  state  byte  in  TTg  (where  we  define 
0“^  =  0),  followed  by  an  F2-linear  operation.  Also  recall  that  x~^  =  in  the  field  Ag.  We  can  therefore  apply  the 
S-Box  operation  to  our  encrypted  state  using  the  following  method: 

-  t  ■<—  c. 

-  For  i  =  1  to  6  do 

-  t  ^  t  ®t. 

-  t  ^  t  ®  c. 

-  t  ^  t®t. 

-  Extract  eight  ciphertexts  to, ...  +7  such  that  ti  is  the  (parallel)  encryption  of  the  i-th  bit  of  all  16  values  in  t. 

-  Perform  the  linear  operation  onto, ...  ,t7  in  parallel  to  produce  ciphertexts  so, ...  ,S7. 

-  Map  these  ciphertexts  back  to  an  encryption  of  an  element  in  A. 

The  first  step,  that  of  producing  an  encryption  t  of  x^^^  where  c  is  an  encryption  of  x,  requires  at  most  13  fully  homo¬ 
morphic  multiplications.  The  second  step  of  extracting  the  ciphertexts  to, . . . ,  is  essentially  a  single  linear  operation. 
The  third  step  of  adding  the  elements  to, .  ■  ■  ,t7  together  to  produce  so, ...  ,87,  requires  4  •  8  =  32  homomorphic  addi¬ 
tions,  due  to  the  nature  of  the  linear  operation  in  AES.  The  final  step  of  obtaining  a  single  ciphertext  from  so,  •  •  • ,  S7  is 
also  an  application  of  a  linear  operation.  Thus  the  total  cost  of  SubBytes  is  given  by  13  •  Cm  +  32  •  Ca  +  2  •  C^. 

We  note  that  our  SIMD  evaluation  of  the  AES  round  function  not  only  benefits  in  our  system  from  being  able  to 
execute  16  operations  in  parallel.  We  also  have  the  benefit  of  being  able  to  deal  directly  with  F28  arithmetic  operations, 
as  well  as  decompose  into  bits  where  necessary  in  the  linear  transformation  in  the  S-Box  operation.  The  total  cost  of  a 
round  function  being  given  by 


14  •  Cm  +  37  •  Ca  +  7  •  Cl, 

although  by  interleaving  operations  a  lower  cost  could  probably  be  obtained. 

29 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


8.3  Data  Base  Lookup 

We  end  by  examining  a  more  realistic  application  scenario,  namely  one  of  searching  an  encrypted  database  on  a  remote 
server.  Suppose  a  user  has  previously  encrypted  a  database  and  stored  it  on  a  cloud  service  provider,  and  now  she  wishes 
to  retrieve  some  of  the  data.  We  first  note  that  the  usual  atomic  database  operation  of  search  actually  consists  of  two 
operations.  The  first  operation  is  one  of  search,  whereas  the  second  is  one  of  retrieval.  The  following  method  performs 
the  search  using  FHE  and  the  retrieval  using  Private  Information  Retrieval  (PIR). 

We  assume  the  database  is  such  that  one  can  determine  beforehand  which  fields  will  be  searched  on.  In  some  sense 
this  is  akin  to  the  basic  premise  of  public  key  encryption  with  keyword  search  [1],  however  we  have  a  more  complicated 
data  retrieval  operation  to  perform.  To  simplify  the  discussion  we  assume  that  there  is  only  one  database  field  which  is 
searchable,  and  another  field  which  contains  the  information.  Each  database  entry  (in  the  clear)  is  then  given  by  a  tuple 
(i,  s,  d),  where  s  is  the  search  term,  d  is  the  data  and  i  is  some  index  which  is  going  to  enable  retrieval.  The  number  of 
such  items  we  denote  by  r.  We  assume  that  i  and  s  are  n  bits  in  length,  and  thus  can  be  encoded  as  an  element  of  the 
finite  field  Kn  =  F2" . 

To  encrypt  the  database  the  user  picks  a  public/private  key  pair  (pk,  sk)  for  our  scheme,  as  well  as  a  symmetric  key 
K  for  a  symmetric  encryption  scheme  {Ex,  Dx)-  Let  us  assume  that  the  encryption  scheme  can  support  I  operations  in 
F2"  in  parallel.  When  placing  the  database  on  the  cloud  service  provider  the  user  divides  the  database  into  [r/i]  blocks 
of  I  items.  Then  to  actually  send  the  server  the  jth  encrypted  data  block,  for  j  =  0, 1,  2, ... ,  \r/f\  —  1  we  send 

(ij,Cj,Ej)  =  {ii.j^i, . . . 

Encrypt(r„_i(s;.j_,_i, . . . ,  pk), 

Ex{dl-j+i),  ■  ■■,  Ex{di.(j+i)))  ■ 


We  now  discuss  how  the  user  retrieves  all  data  items  which  correspond  to  the  search  term  s.  We  first  recover  an  encryp¬ 
tion  of  an  encoding  of  the  index  terms  which  contain  this  search  term.  This  is  done  by  sending  the  server  one  ciphertext, 
and  receiving  one  in  return.  The  sent  “query”  ciphertext  is  equal  to 

q  =  Encrypt(r„_;(s, . . .  ,s),  pk), 
i.e.  an  encryption  of  I  copies  of  the  query  term  s. 

The  server  then  takes  each  data  block  (ij,  cj,  Ej)  and  computes  =  q  ©pk  cj.  The  value  is  then  homomor- 

phically  raised  to  the  power  2”  —  I,  by  performing  2n  applications  of  Mult.  This  results  in  a  ciphertext  which  is  an 
encryption  of  a  vector  of  zero  and  ones,  with  a  one  only  occurring  in  position  k  when  s  is  not  equal  to  the  fcth  component 
of  the  vector  underlying  the  ciphertext  Cj . 

The  server  then  computes  ©pk  Encrypt(T'„  ;(1,  !,...,!),  pk))  ©pk  Encrypt(r’„ /(ij),  pk),  and  the  set  of 

ciphertexts  are  then  added  together  using  Add  to  obtain  a  final  ciphertext  c^  which  is  returned  to  the  user.  Note,  that 
this  “search”  query  has  a  cost  of  (2  •  n  +  1)  •  Cm  +  2  •  Ca  per  data  block. 

The  plaintext  underlying  the  returned  ciphertext  c'  consists  of  I  components,  where  the  fcth  component  is  given  by 

©  h-j+k- 

■  j  +  k 

If  there  is  only  one  match  per  component  then  we  have  recovered  the  matching  indices  and  hence  can  recover  the  actual 
data  by  engaging  in  a  PIR  protocol  [5, 18].  The  problem  arises  when  we  have  the  possibility  of  more  than  one  match  per 
component  per  query.  In  this  situation  we  need  an  encoding  algorithm  to  enable  us  to  recover  the  exact  PIR  inputs  we 
need  to  recover  the  data. 

In  the  extreme  case  we  have  a  possibility  of  every  component  containing  \r/l]  matches,  i.e.  the  search  term  s 
matches  with  every  item  in  the  database.  In  which  case  we  obtain,  via  a  trivial  encoding,  that  we  must  have  \r/f\  <  n. 
This  essentially  implies  that  the  length  of  the  database  is  bounded  by  the  number  of  bits  we  can  encrypt,  i.e.  r  <  I  ■  n. 

However,  if  we  can  ensure  that  a  maximum  of  t  matches  can  occur  per  SIMD  component  then  we  can  produce  a 
more  effective  encoding  as  follows:  Firstly  we  assume  the  encoding  used  for  data  retrieval  in  the  PIR  is  such  that  we 
recover  the  data  item  corresponding  to  an  index/component  position  pair.  This  simplifies  our  discussion  as  we  only  have 
to  concentrate  on  decoding  a  single  component. 

We  set  m  =  \r /t],  and  to  each  of  the  m  blocks  we  associate  an  n-bit  index  i.  We  want  to  therefore  be  able,  given 
an  xor  of  the  indices  2  =  ©  . . .  ©  ,  with  s  <  t,to  recover  the  set  , . . . ,  }.  To  construct  the  encoding  we  take 

the  parity  matrix  of  an  [A^,  K,  D]  linear  code  over  F2  of  length  N,  rank  K  and  minimum  distance  D,  which  we  assume 
is  greater  2  •  t.  This  is  a  matrix  of  dimension  {N  —  K)  x  N.  We  then  take  as  our  indices  the  columns  of  this  matrix, 
which  implies  that  these  indices  must  fit  in  n  bits,  hence  N  —  K  <  n.  Given  an  xor  of  at  most  t  indices  we  can  recover 
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which  indices  were  xor-ed  together  by  decoding  the  [A'^,  K,  D]  linear  code.  To  see  this  notice  that  the  sum  of  indices  z 
is  a  syndrome  of  a  codeword  in  the  linear  code.  Thus  by  recovering  the  error  positions  in  the  code  from  the  syndrome 
we  know  which  indices,  i.e.  which  columns  of  the  parity  check  matrix,  were  xor-ed  together.  Thus  the  total  number  of 
distinct  indices  we  can  cope  with  is  bounded  by  the  column  size  of  the  parity  check  matrix,  i.e.  N.  Hence,  we  obtain 

m  =  \r/r\  <  N . 

As  an  example  of  a  possible  encoding  scheme  we  take  a  primitive  BCH  code  which  exists  for  any  pair  of  values 
of  (s,  t)  such  that  s  >  3  and  t  <  2^*“^.  The  primitive  BCH  code  over  F2  then  has  parameters  given  by  A  =  2^  —  1, 
N  —  K  <  s-t  and  D  >  2  •  t  -|- 1.  If  we  take  our  THE  scheme  of  the  previous  section  using  the  mth  cyclotomic  polynomial 
with  m  =  3485,  then  we  have  I  =  64,  n  <  d  =  40  and  ^(m)  =  2560.  Given  the  bounds 

\r/l~\  <  A  =  2®  —  1  and  s  ■  t  <n, 

and  supposing  we  take  t  =  3,  so  we  can  recover  at  most  three  collisions  on  search  terms  within  each  component,  then  by 
setting  n  =  d  =  40  and  (s,  t)  =  (13,  3)  we  obtain  a  valid  encoding.  This  implies  that  the  total  number  of  items  within 
the  database  is  bounded  by  Z  •  A  =  524224.  Clearly  using  more  optimal  codes,  or  different  cyclotomic  polynomials  one 
can  obtain  larger  values  of  the  whole  database,  or  one  can  deal  with  more  collisions  within  a  component. 

The  above  methodology  using  our  SIMD  enabled  FHE  scheme  to  search  on  I  components  at  once  in  an  efficient 
manner,  results  in  a  linear  speed  up  in  the  search  of  the  encrypted  database.  However,  there  is  another  advantage  of  our 
splitting  the  database  into  I  components;  we  can  deal  with  (albeit  having  a  probability  of  invalid  indices  being  returned) 
having  more  collisions  between  the  search  terms.  In  the  above  example  we  could  deal  with  up  to  three  collisions  in 
each  component,  this  meant  that  our  method  would  be  guaranteed  to  be  correct  if  there  were  at  most  three  items  in 
the  database  corresponding  to  each  search  item.  However,  if  we  assume  that  the  search  items  are  randomly  distributed 
between  the  I  components,  then  in  practice  we  can  deal  with  more  collisions,  since  our  results  will  be  correct  as  long  as 
there  are  at  most  t  collisions  per  component.  The  generalised  birthday  bound  [24]  says  that  we  can  have 

collisions  before  the  probability  of  obtaining  more  than  t  collisions  in  one  of  the  I  components  is  greater  than  1  /2.  In 
our  above  numerical  example,  with  t  =  3  and  I  =  64,  this  equates  to  just  over  29  matches  in  our  database. 
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Abstract.  A  key  problem  with  the  original  implementation  of  the  Gen¬ 
try  Fully  Homomorphic  Encryption  scheme  was  the  slow  key  generation 
process.  Gentry  and  Halevi  provided  a  fast  technique  for  2-power  cyclo- 
tomic  fields.  We  present  an  extension  of  the  Gentry-Halevi  key  genera¬ 
tion  technique  for  arbitrary  cyclotomic  fields.  Our  new  method  is  roughly 
twice  as  efficient  as  the  previous  best  methods.  Our  estimates  are  backed 
up  with  experimental  data. 


The  major  theoretical  cryptographic  advance  in  the  last  three  years  was  the 
discovery  by  Gentry  in  2009  of  a  fully  homomorphic  encryption  scheme  [4, 5] . 
Gentry’s  scheme  was  initially  presented  as  a  completely  theoretical  construction, 
however  it  was  soon  realised  that  by  specialising  the  construction  one  could  ac¬ 
tually  obtain  a  system  which  could  at  least  be  implemented;  although  not  yet 
in  such  a  way  as  to  enable  practical  computations.  The  first  such  implementa¬ 
tion  was  presented  by  Smart  and  Vercauteren  [10].  The  Smart  and  Vercauteren 
implementation  used  arithmetic  of  cyclotomic  number  fields.  In  particular  they 
focused  on  the  field  generated  by  the  polynomial  F{X)  =  +  1,  but  they 

noted  that  the  scheme  could  be  applied  with  arbitrary  (even  non-cyclotomic) 
number  fields.  A  main  problem  with  the  version  of  Smart  and  Vercauteren  was 
that  the  key  generation  method  was  very  slow  indeed. 

In  [6]  Gentry  and  Halevi  presented  a  new  implementation  of  the  variant  of 
Smart  and  Vercauteren,  but  with  a  greatly  improved  key  generation  phase.  In 
particular  Gentry  and  Halevi  note  that  key  generation  (for  cyclotomic  fields)  is 
essentially  an  application  of  a  Discrete  Fourier  Transform,  followed  by  a  small 
amount  of  computation,  and  then  application  of  the  inverse  Discrete  Fourier 
Transform.  They  then  show  that  one  does  not  even  need  to  perform  the  DFT’s 
if  one  selects  the  cyclotomic  field  to  be  of  the  form  +  1.  They  do  this  by 
providing  a  recursive  method  to  deduce  two  constants,  from  the  secret  key,  which 
enables  the  key  generation  algorithm  to  construct  a  valid  associate  public  key. 
The  key  generation  method  of  Gentry  and  Halevi  is  fast,  but  appears  particularly 
tailored  to  working  with  two-power  roots  of  unity. 

However,  the  extra  speed  of  their  key  generation  method  comes  at  a  cost. 
Restricting  to  two-power  roots  of  unity  means  that  one  is  precluded  from  the 
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type  of  SIMD  operations  discussed  in  [11].  To  enable  such  operations  one  needs 
to  be  able  to  deal  with  general  cyclotomic  number  fields.  In  [11]  it  is  pointed  out 
that  the  DFT/inverse-DFT  method  can  be  easily  applied  to  the  case  of  general 
cyclotomic  fields  via  the  use  of  the  FFT  algorithms  such  as  those  of  Good- 
Thomas  [7, 12],  Rader  [9]  and  others.  However,  the  simple  recursive  method  of 
Gentry  and  Halevi  does  not  seem  to  apply. 

Other  works  have  examined  ways  of  improving  key  generation,  and  fully 
homomorphic  encryption  schemes  in  particular.  For  example  [8]  has  a  method  to 
construct  keys  for  essentially  random  number  fields  by  pulling  random  elements 
and  analyzing  eigenvalues  of  the  corresponding  matrices;  this  method  however 
does  not  allow  the  efficiency  improvements  of  [10]  and  [6]  with  respect  to  reduced 
ciphertext  sizes  etc.  More  recent  fully  homomorphic  schemes  based  on  the  LWE 
assumption  [3]  have  more  efficient  key  generation  procedures  than  the  original 
Gentry  scheme;  and  appear  to  be  more  suitable  in  practice.  However  for  this 
work  we  concentrate  purely  on  the  schemes  in  the  “Gentry  family” . 

In  this  paper  we  present  an  analysis  of  the  key  generation  algorithm,  for  Gen¬ 
try  based  schemes,  for  general  cyclotomic  fields,  generated  by  the  the  primitive 
m-th  roots  of  unity.  In  particular,  we  show  that  Gentry  and  Halevi’s  recursive 
method  can  be  generalised  to  deal  with  prime  power  values  of  m,  and  also  any 
m  with  just  a  few  small,  repeated  prime  factors.  We  also  show  for  general  m 
that  the  DFT/inverse-DFT  method  is  sub-optimal,  and  that  an  algorithm  exists 
which  requires  only  a  single  DFT  application  to  compute  the  secret  key.  Our 
general  key  generation  method  is  essentially  twice  as  fast  as  previous  methods; 
both  theoretically  and  in  practice. 

The  paper  is  organized  as  follows:  In  Section  1  we  present  the  required  math¬ 
ematical  background  and  notation.  In  Section  2  we  present  the  required  infor¬ 
mation  about  the  key  generation  method  for  the  variant  of  Gentry’s  scheme  we 
will  be  discussing.  Then  in  Section  3  we  describe  how  one  could  execute  the 
key  generation  procedure  assuming  as  soon  as  two  coefficients  of  one  associated 
polynomial  g{X)  and  one  coefficient  of  another  associated  polynomial  h{X)  are 
computed.  Algorithms  to  compute  these  three  coefficients  are  then  presented  in 
Section  4.  Finally  in  Section  5  we  present  some  experimental  results. 


1  Mathematical  Background 

Let  F{X)  =  <Prn{X)  denote  the  m-th  cyclotomic  polynomial,  i.e.  the  irreducible 
polynomial  whose  roots  are  the  primitive  m-th  roots  of  unity.  This  polynomial 
has  degree  N  =  (j){m),  where  /)(•)  is  Euler’s  phi-function.  We  let  the  m-th  roots 
of  unity  be  denoted  by  . . . ,  ,  which  are  defined  as  powers  of  tOm  = 

exp(^^^^^^^),  the  principal  m-th  root  of  unity.  The  roots  of  F{X)  are  those  values 
where  gcd(i,  m)  =  1.  We  let  po, . . . ,  pn-i  denote  these  primitive  m-th  roots 
of  unity  (i.e.  the  roots  of  F). 

If  f{X)  G  Z[A]  is  an  arbitrary  polynomial  then  we  let  fi  denote  the  coefficient 
of  A*  in  f(X).  For  a  polynomial  f(X)  we  let  ||/||oo  =  maxf^g^'^^  |/i|  denote  the 
infinity-norm  (i.e.  the  max-norm)  of  its  coefficient  vector.  Given  two  polynomials 
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f{X)  and  g{X)  the  resultant  of  /  and  g  is  defined  to  be 

resultant(/,5)  =  J|(a  -  /3) 

a,f3 

where  a  ranges  over  the  roots  of  f{X)  and  f3  ranges  over  the  roots  of  g{X).  We 
also  have  that 

resultant(/,5)  =  ]j5(a).  (1) 

a 

Given  a  polynomial  x{X)  of  degree  m—  1,  which  is  simply  a  list  of  coefficients 
Xq,  Xi, . . . ,  Xm-i,  the  Discrete  Fourier  Transform  (DFT)  is  defined  by  the  evalu¬ 
ation  of  this  polynomial  at  all  of  the  m-th  roots  of  unity.  So  the  /c-th  coefficient 
of  the  DFT  is  then 

m—l 

Xfe  =  ^  XiUJ^J^. 
i=0 

Naive  computation  of  the  DFT  from  this  definition  takes  0{m?)  operations.  Fast 
Fourier  Transform  (FFT)  algorithms  reduce  this  to  O(TOlogTO).  The  inverse- 
DFT  is  the  procedure  which  takes  m  evaluations  of  a  polynomial  at  the  m-th 
roots  of  unity,  and  then  recovers  the  polynomial.  We  write  x  4—  DFT(a;)  and 
a;  ^  DFT-i(x). 

2  Key  Generation  for  Gentry 

Key  generation  for  Gentry’s  FHE  scheme  depends  on  two  parameters  m  and  t. 
The  value  m  defines  the  underlying  cyclotomic  field  as  above,  and  we  define  N  = 
which  is  the  degree  of  the  cyclotomic  polynomial  F{X).  The  parameter  t 
is  used  to  define  how  “small”  the  secret  key  is.  Note  that  in  practice  the  word 
“small”  is  a  relative  term  and  we  are  not  really  dealing  with  small  numbers  at 
all.  To  generate  keys  for  Gentry’s  FHE  scheme  one  can  proceed  as  follows: 

—  v{X)  ^  '^[X]  with  ||u||oo  <  2*  and  u(X)  =  1  (mod  2). 

—  Gompute  w{X)  e  Z[XJ  such  that 

d  =  v(X)  ■  w(X)  (mod  F(X)) 
where  d  =  resultant(u,  /). 

—  If  v(X)  and  w{X)  do  not  have  a  common  root  modulo  d  then  return  to  the 
beginning  and  choose  another  v{X). 

—  Let  a  €  Ijd  denote  the  common  root. 

—  Set  pt  ^  (a,  d)  and  st  ^  {w{X),  d). 

Note,  there  are  various  minor  variations  on  the  above  procedure  in  the  literature. 
In  Smart  and  Vercauteren  [10]  the  polynomial  v{X)  is  rejected  unless  d  is  prime; 
this  is  done  due  to  the  method  the  authors  used  to  compute  the  common  root  a. 
Gentry  and  Halevi  [6]  notice  that  if  v{X)  and  f{X)  have  a  common  root  modulo 
f{X)  then  it  is  given  by  a  =  —wn-i/wq  (mod  d).  Gentry  and  Halevi,  make  an 

35 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


additional  modification,  in  that  the  condition  on  v{X)  =  1  (mod  2)  is  dropped, 
and  replaced  by  the  condition  that  d  =  1  (mod  2) ;  this  means  the  authors  only 
need  to  compute  one  coefficient  of  w{X)  for  their  application.  However,  in  [11], 
the  authors  show  that  selecting  v{X)  =  1  (mod  2)  enables  SIMD  style  operations 
on  data,  as  long  as  m  yf  2’’.  They  also  show  that  whilst  all  coefficients  of  w{X) 
are  needed  in  the  secret  key,  one  can  generate  all  of  them  via  the  relation 


Wi  = 


awi+i  +  F^+iWn-1 
—awo 


(mod  d)  ifO<i<N— 1 
(mod  d)  if  i  =  N  —  1 


(2) 


The  main  question  is  then  how  to  compute  Wq  and  d.  In  [6, 11]  it  is  pointed  out 
that  the  following  DFT-based  procedure  can  be  applied: 

-  DFT(u(X)). 

^  ^  ngcd(i,m)  — 1 

-  Wi  ^  d/vi. 

-  w{X)  ^  DFT-i(w). 

Gentry  and  Halevi  [6]  then  go  on  to  notice  that  one  can  actually  compute  w{X) 
and  d  without  any  need  for  computing  DFTs.  They  do  this,  since  they  solely 
focus  on  the  case  m  =  2'’,  which  enables  them  to  present  the  calculation  of 
d  and  w{X)  as  the  calculation  of  computing  two  coefficients  of  an  associated 
polynomial  g{X). 

In  this  paper  we  generalise  this  method  of  Gentry  and  Halevi  to  arbitrary 
values  of  m;  for  non-prime  powers  of  m  we  will  still  require  the  application 
of  a  single  DFT  algorithm,  but  will  no  longer  need  the  inverse  DFT.  The  key 
observation  is  that  d  and  w{X)  are  related,  for  general  m,  to  the  coefficients  of 
two  associated  polynomials  g{X)  and  h{X).  It  is  to  these  polynomials,  and  their 
properties,  that  we  now  turn. 


3  The  Polynomials  g{X)  and  h{X) 

Before  proceeding  we  introduce  Ramanujan  sums,  for  those  readers  who  are  not 
acquainted  with  them.  A  Ramanujan  sum  is  simply  a  sum  of  powers  of  primitive 
roots  of  unity: 

m  — 1 

C^{k):=  E  E  ^(7)^ 

i=0  d\(k,m) 

where  the  second  sum  is  over  the  positive  divisors  of  gcd(A:,m),  and  g,  is  the 
Mobius  function.  For  a  proof  of  this  formula  see  e.g.  [2,  p.  162].  The  Ramanujan 
sum  can  therefore  be  easily  computed  provided  m  can  be  factored  efficiently; 
this  will  always  be  the  case  in  our  applications  since  m  is  a  small  integer.  It 
is  clear  from  this  formula  that  Cm{—k)  =  Cm{k).  We  also  have  the  following 
result,  which  we  will  need: 
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Proposition  1.  Let  Fi  denote  the  i-th  coeffieient  of  the  m-th  eyclotomic  poly¬ 
nomial  F{X).  Then  for  k  =  0, . . . ,  N  —  1, 


N-l 

Cjn{i  -  k)  ■  Fi+i  =  -Cm{-k  -  1). 

Proof.  Suppose  that  ^  is  a  root  of  F.  Observing  that  since  F  is  a  eyclotomic 
polynomial,  Fq  =  Fjsf  =  1,  and  so 

N  N-l 

-i  =  j2F,e^  = 

i—1  2=0 


This  is  equivalent  to 


N-l 


i=0 

The  above  relation  can  then  be  applied  to  the  individual  summands  in  Cm{k) 
(which  are  powers  of  the  roots  of  F)  to  give  the  desired  result. 


We  now  turn  to  our  key  generation  method.  Given  v{X)  we  define  the  fol¬ 
lowing  polynomials, 


N-l 

9{x)  ■=  n 
2  =  0 
A^-1 

HN  ■=  n  “  x/p,). 

i=0 


The  polynomial  g  here  is  the  same  as  that  defined  in  [6].  However,  when  m  is 
not  a  power  of  2  we  also  need  to  introduce  h{X)  in  order  to  help  us  find  w. 

The  constant-term  and  degree  one  coefficients  of  these  polynomials,  i.e.  go, 
gi,  ho  and  hi,  must  then  be  computed.  We  leave  discussion  of  how  this  step  is 
done  until  the  next  section.  In  this  section  we  detail  how,  given  these  coefficients, 
we  can  compute  w{X)  and  d.  Note  that  because  of  Equation  1,  the  values  go 
and  ho  are  both  equal  to  the  resultant,  d,  of  v  and  /. 

We  also  have 


N-l 

91  = -F 

2=0  i^2 


N-l 


E 


v(.Pi) 


N-l 


d 

’vi.Pi) 


and  similarly. 


hi 


N-l 


E 


w{Pi) 


N-l 

E 

2=0 


(3) 


(4) 
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To  determine  the  coefficients  of  w,  we  first  look  at  a  more  general  form  of  the 
above  expressions  for  gi  and  hi,  and  show  how  this  relates  to  w.  Define  for  fc  >  0 
the  following  sequence  of  sums 


N-l 


w,:=y: 

2=0 


wijhl 

P’t 


Our  strategy  from  here  onwards  is  to  give  a  simple  expression  for  Wk  in  terms 
of  the  coefficients  of  w,  and  then  show  that  the  values  of  Wk  can  be  easily 
computed  independently  using  the  information  we  already  have  of  gi  and  hi. 
Next,  by  looking  at  successive  terms  of  Wk,  a  set  of  simultaneous  equations 
involving  the  coefficients  of  w  will  arise,  and  it  will  be  shown  that  these  can  be 
solved  to  recover  all  of  w. 

Observe  that,  as  a  result  of  Equations  3  and  4,  we  have  Wq  =  —gi,  Wi  =  —hi. 
More  generally,  we  see  that 


N-l 


Wk  =  j2 


—  1 
l2j^o 


■pi 


i=0 


N-l  N-l 


N-l 

Cm{j  -  k)  ■  Wj. 

j=0 


Thus  the  above  equation  gives  us  an  expression  for  Wk  as  a  simple  linear  com¬ 
bination  of  the  coefficients  of  w,  by  the  Ramunujan  sums  C^ij  —  k).  Applying 
Equation  2,  this  allows  us  to  deduce 

Proposition  2. 

Wk  =  a-Wk+i  (mod  d). 


Proof. 


N-l 

Wk=  ^  Cm{i  -k)  -  Wi 

2=0 

N-2  N-2 

=  ^  Cm{i  -  k)  •  a  -  Wi^i  +  WN-i  •  ^  Cm{i  -  k)  • 

2=0  2=0 

+  Cm{N  -  k-1)  '  WN-1 
N-2  N-l 

=  a  •  ^  Cm{i  -  k)  •  Wi^i  +  WN-i  •  ^  Cm{i  -  k)  • 

2=0  2=0 

N-l 

=  a  •  ^  Cm{i  -  k  -  1)  '  Wi  -  WN-l  *  Cm{-k  -  1) 

A^-1 

=  a  •  Cm{i  —  k  —  1)  •  Wi  a  •  Wq  •  Cm{  —  k  —  1) 
2=1 

=  a  •  Wk+i 

From  which  comes  the  following  immediate  corollary: 
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Corollary  1. 


Wk  =  -gi  ■  a  ^  (mod  d). 

Note  that  Proposition  2  immediately  implies  that  a  =  gi/hi  mod  d,  and  thus 
any  value  of  Wk  can  be  easily  determined  using  the  corollary.  This  allows  us  to 
create  a  system  of  linear  equations  in  the  coefficients  of  w,  from  the  values  of 
Wq,  . . . ,  Wn-1,  as  follows: 

/  C™(0) 

C™(1) 


Cm{N-2) 

\Cra{N-l) 


w-v 

We  write  the  above  equation  as  C  ■  w  =  —gi  ■  cx.  The  matrix  C  possesses  the 
interesting  property  that  every  diagonal  is  constant;  as  such  it  is  a  symmetric 
Toeplitz  matrix.  There  is  a  method  to  solve  such  a  system  of  equations  in  only 
0{N‘^)  operations,  as  opposed  to  the  usual  0{N^)  required  for  a  general  matrix 
[13].  We  note,  that  for  a  given  value  of  m  the  matrix  C  is  fixed  and  hence 
computing  its  inverse  can  be  considered  as  a  precomputation.  Thus  with  this 
precomputation  the  cost  of  computing  the  key,  given  the  coefficients  go,gi,  and 
ft-O)  is  a  linear  operation  in  N. 

When  it  comes  to  computing  the  inverse  of  the  matrix  C,  we  note  that  it 
appears  experimentally  to  be  of  the  form,  for  all  m, 

c-i  =  -Z, 

m 

for  some  integral  N  x  N  matrix  Z  whose  coefficients  are  bounded  in  absolute 
value  by  m.  However,  we  were  unable  to  prove  this.  In  any  case  we  can  assume 
this  is  true,  then  efficiently  compute  the  inverse  of  C  by  inverting  C/m  using 
standard  floating  point  arithmetic  and  then  rounding  the  resulting  coefficients 
to  integers.  This  matrix  can  then  be  divided  by  m,  tested  for  correctness  and 
stored. 

4  Determining  Qo,  Qi  and 

In  this  section  we  examine  methods  to  determine  the  coefficients  go,gi  and  hi. 
We  first  present  a  general  method,  which  works  for  arbitrary  values  of  m  and 
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Cm{l) 

C^{0) 

Cm{N  -  3) 

Cm{N  -  2) 


=  -gi 


Cra{N-l)\ 

Cm{N-2) 


Cm{l) 

Cm{0) 


(mod  d) 


/  ^co  \ 

Wi 

\WN-1/ 
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leads  to  key  generation  that  is  essentially  twice  as  fast  as  existing  methods.  We 
then  describe  a  method  for  “special”  values  of  m,  namely  those  containing  a  large 
number  of  repeated  factors,  such  as  when  m  is  a  prime  power.  By  specialising 
the  results  of  this  section,  and  the  method  in  the  previous  section  to  the  case 
m  =  2”,  we  obtain  the  key  generation  method  of  Gentry  and  Halevi. 


4.1  General  m 


We  note  that  the  desired  coefficients  of  g  and  h  can  be  computed  directly  from 
the  FFT  of  v.  Thus  by  applying  one  FFT  and  the  techniques  of  the  previous 
section  we  can  avoid  the  second  inverse-FFT  required  of  the  method  in  Section  2. 
Hence,  we  can  obtain  a  method  which  is  essentially  twice  as  fast  as  that  proposed 
in  2. 

Recall  that  the  FFT  of  v  gives  the  values  v{po),v{pi), . . .  ,v{pN-i)-  With 
these  computed,  go  is  obtained  by  simply  multiplying  them  together  (as  is  done 
in  the  FFT-based  key  generation  algorithm).  Then  note  that 


and 


N-l 


z=0 


90 

v{pi) 


hi 


N-l 


E 


90 

■  v{pz) ' 


So  the  coefficients  gi  and  hi  can  all  be  computed  in  0{N)  operations  (albeit  on 
numbers  of  0{N  ■  t)  bits  in  length),  once  the  initial  FFT  of  v  is  computed.  This 
may  not  seem  a  major  improvement,  after  all  we  have  only  really  saved  one  FFT 
out  of  two;  but  there  is  a  huge  implied  constant  in  the  big-Oh  notation  due  to 
the  fact  that  the  coefficients  of  the  polynomial  w(X)  are  all  of  size  around  2-^  *, 
which  is  practice  will  result  in  many  millions  of  bits  of  precision  being  needed  in 
the  FFT  algorithms. 


4.2  The  case  m  — 

We  first  define  the  following  two  polynomials 

p-i 

a{X)  =  v{aj  •  X) 

3=0 

p-i 

b{X)  =  J2'[lv{arX). 

3=0  3  ¥=i 

where  ao,  ■  ■  ■ ,  ap-i  denote  the  p-th  roots  of  unity.  By  elementary  Galois  theory 
we  find  that  the  coefficients  of  a  must  be  rational  integers.  We  observe  that 
a(ai  •  X)  =  a(X),  so  it  must  follow  that  the  t-th  coefficient  of  a  will  be  zero  if  i 
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is  not  a  multiple  of  p.  By  a  similar  argument  we  also  deduce  that  b{X)  G 
and  that  6^  =  0  if  i  is  not  a  multiple  of  p. 

Our  algorithm  will  depend  on  starting  with  the  polynomials  a(X)  and  b(X). 
These  can  be  easily  computed  due  to  the  following  observations.  Firstly,  by  [1, 
Proposition  4.3.4],  we  have 

a{XP)  =  p^~^  ■  resultanty(u(F),p  ■  X  —  p  ■  Y^). 


where  resultanty(/, g)  denotes  the  resultant  polynomial  in  Y  of  the  bivariate 
polynomials  /  and  g.  Note  that  when  computing  this  resultant,  every  occur¬ 
rence  of  YP  in  the  polynomial  v(Y)  can  be  replaced  with  X  to  vastly  speed  up 
computation  time. 

Now  notice  also  that 


p-i 


i=0 


v(ai  ■  X) 


p-i  /  P~^ 

v{ai  •  X) 


i=0 


i=0 


Then  by  writing  {a/v){X)  =  changing  the  order  of  summa¬ 

tions,  we  obtain: 


Af-l  /p-l  \  N/p-l 

b{X)=Y,B,-X^-iY.^']=P  T.  Bp;- 

i— 0  \i=0  /  j— 0 

So  the  polynomial  b{x)  can  be  computed  from  the  coefficients  of  the  quotient 
polynomial  a/v,  note  that  this  is  an  exact  polynomial  division  over  Z[X]. 

Now  recall  the  definition  of  g,  in  terms  of  v  evaluated  at  the  primitive  roots 
of  unity: 


N-l 

gW  ■=  n 

i=0 

Since  m  =  p'",  it  can  be  shown  that  the  primitive  m-th  roots  of  unity  are  heavily 
related  to  the  p-th  roots  of  unity,  . . . ,  Op-i.  For  any  k  G  {0, . . .  ,p  —  1}, 

Pi+k-N/p  *  Pi- 

Using  this  fact,  the  length-(iV  —  1)  product  defining  g  above  can  be  re-expressed 
as  a  length-(7V/p  —  1)  product  of  p-products,  involving  the  p-th  roots  of  unity. 
Applying  this  to  g  and  then  evaluating  modulo  X^  (to  obtain  the  lowest  two 
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coefficients)  gives 

N/p—1 p—1 

9{X)=  W{v{aj  ■  Pi)  -  X) 

j— 0 


N/p-l 

^  p-i 

p-1 

n  1 

n  ■ 

a) 

2  =  0 

! 

il 

o 

II 

J 

a{Pi) 

bip^) 

N/p-l 

n  ( 

a{pi)  -  X 

•  Kp^)) 

(mod  X^). 

2  =  0 


(mod  X^) 


Since  a{X)  and  b{X)  are  integer  polynomials  whose  i-th  coefficient  is  zero  if  p 
does  not  divide  i,  and  that  F{X)  (the  p’'-th  cyclotomic  polynomial)  has  non-zero 
coefficients  only  for  coefficients  of  X  to  the  power  of  some  multiple  of  p^~^,  we 
have  that  a'{X)  :=  a{X)  (mod  F{X))  and  b'{X)  :=  b{X)  (mod  F{X))  will  also 
be  polynomials  whose  i-th  coefficient  is  zero  if  p  does  not  divide  i. 

So,  if  we  define  the  polynomials  V,  U,  such  that  V{XP)  =  a{X)  (mod  F{X)) 
and  U{XP)  =  b{X)  (mod  f{X)),  then  we  have  reduced  the  original  product  of 
length  N  over  v  of  degree  —  1  down  to  a  product  of  length  N/p  over  the 
polynomials  V  and  U,  which  have  degree  N/p  —  1.  This  process  can  be  applied 
recursively,  until  we  end  up  with  a  final  product  of  size  N /p^~^  =  p—1.  This  last 
product  can  then  be  computed  in  the  naive  manner  to  obtain  g{X)  (mod  X^). 
A  similar  reduction  can  also  be  applied  to  h. 

The  algorithm  in  Figure  1  shows  how  this  reduction  can  be  applied  to  com¬ 
pute  go  and  gi.  A  simple  modification  to  the  algorithm  will  also  allow  ft-i  to 
be  computed  at  the  same  time.  The  proof  of  correctness  for  this  is  an  obvious 
generalisation  of  the  proof  for  the  Gentry  and  Halevi  reduction  [6]  and  so  is 
omitted. 


4.3  m  contains  repeated  factors 

The  algorithm  described  above  can  be  used  to  speed  up  computation  of  g  and 
h  whenever  m  contains  a  repeated  prime  factor.  If  m  =  p\^  then  for 

every  >  1,  —  1  steps  of  the  algorithm  in  Figure  1  can  be  carried  out.  So 

after  each  of  these  reductions  the  final  product  to  be  computed  will  be  of  size 
{pi  —  1)  •  •  •  (ps  —  1).  Clearly  this  speed  improvement  is  most  pronounced  when 
m  =  p''  for  some  small  p,  but  it  is  nevertheless  useful  to  note  that  gains  can  be 
made  for  any  m  with  repeated  prime  factors. 

5  Experiment  Results 

We  now  present  some  computational  results  for  the  relative  performance  of 
our  new  key  generation  method  compared  to  the  previous  version.  The  orig¬ 
inal  method  was  implemented  in  C-|— I-  using  the  MPFR  library  for  arbitrary 
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COMPUTE-gl-COEFFICIENTS(v,  p,  r) 

1  m  ^  p'" 

2  F{X)^<Pm{X) 

3  U{X)  ^  1 

4  vlx)  ^  v{x) 

5  while  m  >  p 

6  v{X)  ^  V{X)  (mod  F{X)) 

7  V(X)  -i—  resultanty  (t;(y),  p  ■  X  —  p  ■ 

8  q{X)  ^  U{X)  ■  V{Xn/v{X) 

9  for  i  <—  0  to  deg(q')/p 

10  U-i  <  Qp  i 

11  U{X)  ^  ulx)  (mod  F{X)) 

12  U{X)  ^  ulx^/P) 

13  m  ^  m/p 

14  F{X)  ^  <l>m{X) 

15  II  After  the  reduction,  p  —  1  terms  are  left  in  the  product. 

16 

17  5o-ny(p*),  gi^^i^U{p^)Y[V(p^) 

i=l  i  =  l  j^i 

18  return  go,  g\ 


Fig.  1.  Algorithm  to  compute  go  and  gi  when  m  =  p^. 


precision  floating  point  arithmetic,  compiled  using  GCC  4.3.5.  Our  new  method 
was  coded  with  the  computer  algebra  system  Sage.  Both  algorithms  were  run 
on  a  high-powered  server  featuring  an  Intel  Xeon  E5620  processor  running  at 
2.4GHz,  with  a  12MB  cache  and  48GB  of  memory. 

We  first  describe  the  performance  at  four  different  values  of  m,  each  with 
different  factorization  properties.  Namely,  m  =  4391,  5555,  6561  and  10125, 
which  result  in  values  of  n  =  (j>{rn)  in  the  range  [4000,5400].  The  results  (in 
minutes)  for  a  value  of  t  =  400  are  given  in  Table  1. 


m 

4391 

5555  (=  5-  11-  101) 

6561  (=  3'") 

10125  (=  3^  •  5^) 

(j){m) 

4390 

4000 

4374 

5400 

Original  Method 

274 

137 

204 

451 

New  Method 

164 

67 

30 

123 

%  Improvement 

40% 

51% 

85% 

72% 

Table  1.  Comparison  of  key  generation  methods  for  t  =  400  and  various  values  of  m. 
Times  are  in  minutes. 
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In  Figure  2,  we  show  how  the  performance  of  each  algorithm  as  affected  by 
t,  for  a  fixed  choice  of  m.  We  test  each  m  with  several  different  choices  of  the 
parameter  t,  the  bit  size  of  the  generated  coefficients.  The  bit  length  of  a  key  will 
be  approximately  t-  so  increasing  t  increases  the  size  of  the  numbers  being 

computed  on,  and  also  requires  a  greater  precision  for  any  necessary  floating 
point  operations. 

It  is  clear  that  our  new  method  is  significantly  faster  than  the  FFT  method 
for  all  choices  of  m.  In  particular,  when  m  contains  many  small  repeated  factors 
(here,  for  m  =  6561  and  10125)  the  improvement  gained  is  almost  an  order  of 
magnitude.  When  the  hybrid  approach  is  taken,  we  see  that  the  cost  of  recovering 
the  key  by  inverting  the  matrix  is  far  lower  than  that  of  using  the  second  (inverse) 
FFT  in  the  standard  FFT  method,  and  results  in  a  speed  increase  of  around  40- 
50%,  as  expected. 


■  FFT  based,  m  =  4391  — FFT  based,  m  =  5555 
*  FFT  based,  m  =  6561  — ♦ —  FFT  based,  m  =  10125 
--•--New  Method,  m  =  4391  -  -■--New  Method,  m  =  5555 
-^-New  Method,  m  =  6561 New  Method,  m  =  10125 


Fig.  2.  Comparison  of  methods  for  various  different  values  of  m,  as  the  parameter  t 
increases. 
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Abstract.  It  is  well  known  that  any  encryption  scheme  which  supports  any  form 
of  homomorphic  operation  cannot  be  secure  against  adaptive  chosen  ciphertext 
attacks.  The  question  then  arises  as  to  what  is  the  most  stringent  security  defini¬ 
tion  which  is  achievable  by  homomorphic  encryption  schemes.  Prior  work  has 
shown  that  various  schemes  which  support  a  single  homomorphic  encryption 
scheme  can  be  shown  to  be  IND-CCAl,  i.e.  secure  against  lunchtime  attacks. 
In  this  paper  we  extend  this  analysis  to  the  recent  fully  homomorphic  encryp¬ 
tion  scheme  proposed  by  Gentry,  as  refined  by  Gentry,  Halevi,  Smart  and  Ver- 
cauteren.  We  show  that  the  basic  Gentry  scheme  is  not  IND-CCAl;  indeed  a 
trivial  lunchtime  attack  allows  one  to  recover  the  secret  key.  We  then  show  that 
a  minor  modification  to  the  variant  of  the  somewhat  homomorphic  encryption 
scheme  of  Smart  and  Vercauteren  will  allow  one  to  achieve  IND-CCAl,  indeed 
PA-1,  in  the  standard  model  assuming  a  lattice  based  knowledge  assumption.  We 
also  examine  the  security  of  the  scheme  against  another  security  notion,  namely 
security  in  the  presence  of  ciphertext  validity  checking  oracles;  and  show  why 
GCA-like  notions  are  important  in  applications  in  which  multiple  parties  submit 
encrypted  data  to  the  “cloud”  for  secure  processing. 


1  Introduction 

That  some  encryption  schemes  allow  homomorphic  operations,  or  exhibit  so  called  pri¬ 
vacy  homomorphisms  in  the  language  of  Rivest  et.  al  [24],  has  often  been  considered 
a  weakness.  This  is  because  any  scheme  which  supports  homomorphic  operations  is 
malleable,  and  hence  is  unable  to  achieve  the  de-facto  security  definition  for  encryption 
namely  IND-CCA2.  However,  homomorphic  encryption  schemes  do  present  a  number 
of  functional  benefits.  For  example  schemes  which  support  a  single  additive  homo¬ 
morphic  operation  have  been  used  to  construct  secure  electronic  voting  schemes,  e.g. 
[9,12]. 
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The  usefulness  of  schemes  supporting  a  single  homomorphic  operation  has  led  some 
authors  to  consider  what  security  definition  existing  homomorphic  encryption  schemes 
meet.  A  natural  notion  to  try  to  achieve  is  that  of  IND-CCAl,  i.e.  security  in  the  pres¬ 
ence  of  a  lunch-time  attack.  Lipmaa  [20]  shows  that  the  ElGamal  encryption  scheme  is 
IND-CCAl  secure  with  respect  to  a  hard  problem  which  is  essentially  the  same  as  the 
IND-CCAl  security  of  the  ElGamal  scheme;  a  path  of  work  recently  extended  in  [2]  to 
other  schemes. 

A  different  line  of  work  has  been  to  examine  security  in  the  context  of  Plaintext 
Awareness,  introduced  by  Bellare  and  Rogaway  [5]  in  the  random  oracle  model  and 
later  refined  into  a  hierarchy  of  security  notions  (PA-0,  -1  and  -2)  by  Bellare  and  Palacio 
[4].  Intuitively  a  scheme  is  said  to  be  PA  if  the  only  way  an  adversary  can  create  a  valid 
ciphertext  is  by  applying  encryption  to  a  public  key  and  a  valid  message.  Bellare  and 
Palacio  prove  that  a  scheme  which  possesses  both  PA-1  (resp.  PA-2)  and  is  IND-CPA, 
is  in  fact  secure  against  IND-CCAl  (resp.  IND-CCA2)  attacks. 

The  advantage  of  Bellare  and  Palacio’s  work  is  that  one  works  in  the  standard  model 
to  prove  security  of  a  scheme;  the  disadvantage  appears  to  be  that  one  needs  to  make 
a  strong  assumption  to  prove  a  scheme  is  PA-1  or  PA-2.  The  assumption  required  is  a 
so-called  knowledge  assumption.  That  such  a  strong  assumption  is  needed  should  not 
be  surprising  as  the  PA  security  notions  are  themselves  very  strong.  In  the  context  of 
encryption  schemes  supporting  a  single  homomorphic  operation  Bellare  and  Pallacio 
show  that  the  Cramer-Shoup  Lite  scheme  [10]  and  an  ElGamal  variant  introduced  by 
Damgard  [11]  are  both  PA-1,  and  hence  IND-CCAl,  assuming  the  standard  DDH  (to 
obtain  IND-CPA  security)  and  a  Diffie-Hellman  knowledge  assumption  (to  obtain  PA- 
1  security).  Informally,  the  Diffie-Hellman  knowledge  assumption  is  the  assumption 
that  an  algorithm  can  only  output  a  Diffie-Hellman  tuple  if  the  algorithm  “knows”  the 
discrete  logarithm  of  one-tuple  member  with  respect  to  another. 

Rivest  et.  al  originally  proposed  homomorphic  encryption  schemes  so  as  to  enable 
arbitrary  computation  on  encrypted  data.  To  perform  such  operations  one  would  require 
an  encryption  scheme  which  supports  two  homomorphic  operations,  which  are  “com¬ 
plete”  in  the  sense  of  allowing  arbitrary  computations.  Such  schemes  are  called  fully 
homomorphic  encryption  (EHE)  schemes,  and  it  was  not  until  Gentry’s  breakthrough 
construction  in  2009  [15,16]  that  such  schemes  could  be  constructed.  Since  Gentry’s 
construction  appeared  a  number  of  variants  have  been  proposed,  such  as  [14],  as  well 
as  various  simplifications  [27]  and  improvements  thereof  [17].  All  such  schemes  have 
been  proved  to  be  IND-CPA,  i.e.  secure  under  chosen  plaintext  attack. 

At  a  high  level  all  these  constructions  work  in  three  stages:  an  initial  somewhat  ho¬ 
momorphic  encryption  (SHE)  scheme  which  supports  homomorphic  evaluation  of  low 
degree  polynomials,  a  process  of  squashing  the  decryption  circuit  and  finally  a  boot¬ 
strapping  procedure  which  will  give  fully  homomorphic  encryption  and  the  evaluation 
of  arbitrary  functions  on  ciphertexts.  In  this  paper  we  focus  solely  on  the  basic  some¬ 
what  homomorphic  scheme,  but  our  attacks  and  analysis  apply  also  to  the  extension 
using  the  bootstrapping  process.  Our  construction  of  an  IND-CCAl  scheme  however 
only  applies  to  the  SHE  constructions  as  all  existing  EHE  constructions  require  public 
keys  which  already  contain  ciphertexts;  thus  with  existing  EHE  constructions  the  notion 
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of  IND-CCAl  security  is  redundant;  although  in  Section  7  we  present  a  notion  of  CCA 
embeddability  which  can  be  extended  to  FHE. 

In  this  paper  we  consider  the  Smart- Vercauteren  variant  [27]  of  Gentry’s  scheme. 
In  this  variant  there  are  two  possible  message  spaces;  one  can  either  use  the  scheme 
to  encrypt  bits,  and  hence  perform  homomorphic  operations  in  F2;  or  one  can  encrypt 
polynomials  of  degree  N  over  F2.  When  one  encrypts  bits  one  achieves  a  scheme  that 
is  a  specialisation  of  the  original  Gentry  scheme,  and  it  is  this  variant  that  has  recently 
been  realised  by  Gentry  and  Halevi  [17].  We  call  this  the  Gentry-Halevi  variant,  to 
avoid  confusion  with  other  variants  of  Gentry’s  scheme,  and  we  show  that  this  scheme 
is  not  IND-CCAl  secure. 

In  particular  in  Section  4  we  present  a  trivial  complete  break  of  the  Gentry-Halevi 
variant  scheme,  in  which  the  secret  key  can  be  recovered  via  a  polynomial  number  of 
queries  to  a  decryption  oracle.  The  attack  we  propose  works  in  a  similar  fashion  to 
the  attack  of  Bleichenbacher  on  RSA  [8],  in  that  on  each  successive  oracle  call  we 
reduce  the  possible  interval  containing  the  secret  key,  based  on  the  output  of  the  oracle. 
Eventually  the  interval  contains  a  single  element,  namely  the  secret  key.  Interesting  all 
the  Bleichenbacher  style  attacks  on  RSA,  [8,21,26],  recover  a  target  message,  and  are 
hence  strictly  CCA2  attacks,  whereas  our  attack  takes  no  target  ciphertext  and  recovers 
the  key  itself. 

In  Section  5  we  go  on  to  show  that  a  modification  of  the  Smart- Vercauteren  SHE 
variant  which  encrypts  polynomials  can  be  shown  to  be  PA-1,  and  hence  is  IND-CCAl. 
Informally  we  use  the  full  Smart- Vercauteren  variant  to  recover  the  random  polyno¬ 
mial  used  to  encrypt  the  plaintext  polynomial  in  the  decryption  phase,  and  then  we 
re-encrypt  the  result  to  check  against  the  ciphertext.  This  forms  a  ciphertext  validity 
check  which  then  allows  us  to  show  PA-1  security  based  on  a  new  lattice  knowledge 
assumption.  Our  lattice  knowledge  assumption  is  a  natural  lattice  based  variant  of  the 
Diffie-Hellman  knowledge  assumption  mentioned  previously.  In  particular  we  assume 
that  if  an  algorithm  is  able  to  output  a  non-lattice  vector  which  is  sufficiently  close  to 
a  lattice  vector  then  it  must  “know”  the  corresponding  close  lattice  vector.  We  hope 
that  this  problem  may  be  of  independent  interest  in  analysing  other  lattice  based  cryp¬ 
tographic  schemes;  indeed  the  notion  is  closely  linked  to  a  key  “quantum”  step  in  the 
results  of  Regev  [23]. 

In  Section  6  we  examine  possible  extensions  of  the  security  notion  for  homomor¬ 
phic  encryption.  We  have  remarked  that  a  homomorphic  encryption  scheme  (either 
one  which  supports  single  homomorphic  operations,  or  a  SHE/EHE  scheme)  cannot 
be  IND-CCA2,  but  we  have  examples  of  singlely  homomorphic  and  SHE  IND-CCAl 
schemes.  The  question  then  arises  as  to  whether  IND-CCAl  is  the  “correct”  security 
definition,  i.e.  whether  this  is  the  strongest  definition  one  can  obtain  for  SHE  schemes. 
In  other  contexts  authors  have  considered  attacks  involving  partial  information  oracles. 
In  [13]  Dent  introduces  the  notion  of  a  CPAh-  attack,  where  the  adversary  is  given  access 
to  an  oracle  which  on  input  of  a  ciphertext  outputs  a  single  bit  indicating  whether  the 
ciphertext  is  valid  or  not.  Such  a  notion  was  originally  introduced  by  Joye,  Quisquater 
and  Yung  [19]  in  the  context  of  attacking  a  variant  of  the  EPOC-2  cipher  which  had 
been  “proved”  IND-CCA2.  This  notion  was  recently  re-introduced  under  the  name  of 
a  CVA  (ciphertext  verification)  attack  by  Hu  et  al  [18],  in  the  context  of  symmetric  en- 
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cryption  schemes.  We  use  the  term  CVA  rather  than  CPA+  as  it  conveys  more  easily  the 
meaning  of  the  security  notion. 

Such  ciphertext  validity  oracles  are  actually  the  key  component  behind  the  tradi¬ 
tional  application  of  Bleichenbacher  style  attacks  against  RSA,  in  that  one  uses  the 
oracle  to  recover  information  about  the  target  plaintext.  We  show  that  our  SHE  scheme 
which  is  IND-CCAl  is  not  IND-CVA,  by  presenting  an  IND-CVA  attack.  In  particu¬ 
lar  this  shows  that  CVA  security  is  not  implied  by  PA-1  security.  Given  PA-1  is  such  a 
strong  notion  this  is  itself  interesting  since  it  shows  that  CVA  attacks  are  relatively  pow¬ 
erful.  The  attack  is  not  of  the  Bleichenbacher  type,  but  is  now  more  akin  to  the  security 
reduction  between  search  and  decision  LWE  [25].  This  attack  opens  up  the  possibil¬ 
ity  of  a  new  SHE  scheme  which  is  also  IND-CVA,  a  topic  which  we  leave  as  an  open 
problem;  or  indeed  the  construction  of  standard  additive  or  multiplicative  homomorphic 
schemes  which  are  IND-CVA. 

Einally,  in  Section  7  we  consider  an  application  area  of  cloud  computing  in  which 
multiple  players  submit  encrypted  data  to  a  cloud  computer;  which  in  turn  will  per¬ 
form  computations  on  the  encrypted  data.  We  show  that  such  a  scenario  does  indeed 
seem  to  require  a  form  of  IND-CCA2  protection  of  ciphertexts,  yet  still  maintaining  ho¬ 
momorphic  properties.  To  deal  with  this  we  introduce  the  notion  of  CCA-embeddable 
homomorphic  encryption. 

2  Notation  and  Standard  Definitions 

Eor  integers  z,  d  reduction  of  z  modulo  d  in  the  interval  [—d/2,  d/2)  will  be  denoted  by 
[z]d-  Eor  a  rational  number  q,  [g]  will  denote  the  rounding  of  q  to  the  nearest  integer, 
and  [g]  denotes  the  (signed)  distance  between  q  and  the  nearest  integer,  i.e.  [g]  =  q—  [qj. 
The  notation  a  ^  h  means  assign  the  object  b  to  a,  whereas  a  ^  B  for  a  set  B  means 
assign  a  uniformly  at  random  from  the  set  B.  If  B  is  an  algorithm  this  means  assign  a 
with  the  output  of  B  where  the  probability  distribution  is  over  the  random  coins  of  B. 

Eor  a  polynomial  F{X)  G  Q[Ar]  we  let  ||T"(A')||oo  denote  the  oo-norm  of  the  co¬ 
efficient  vector,  i.e.  the  maximum  coefficient  in  absolute  value.  If  F(X)  €  Q[Ar]  then 
we  let  [F(2f)]  denote  the  polynomial  in  Z[Ar]  obtained  by  rounding  the  coefficients  of 
F{X)  to  the  nearest  integer. 

Eully  Homomorphic  Encryption:  A  fully  homomorphic  encryption  scheme  is  a 
tuple  of  three  algorithms  E  =  (KeyGen,  Encrypt,  Decrypt)  in  which  the  message  space 
is  a  ring  (i?,  -f,  •)  and  the  ciphertext  space  is  also  a  ring  {TZ,  0,  0)  such  that  for  all 
messages  mi,  m2  €  R,  and  all  outputs  (pk,  sk)  4—  KeyGen(l^),  we  have 

mi  0  m2  =  Decrypt) Encrypt(mi,  pk)  0  Encrypt(m2,  pk),  sk) 
mi  -  m2  =  Decrypt) Encrypt(mi,  pk)  0  Encrypt)m2,  pk),  sk). 

A  scheme  is  said  to  be  somewhat  homomorphic  if  it  can  deal  with  only  a  limited  number 
of  addition  and  multiplications  before  decryption  fails. 

Security  Notions  for  Public  Key  Encryption:  Semantic  security  of  a  public 
key  encryption  scheme,  whether  standard,  homomorphic,  or  fully  homomorphic,  is  cap- 
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tured  by  the  following  game  between  a  challenger  and  an  adversary  A,  running  in  two 
stages; 

-  (pk,  sk)  ^  KeyGen(l^). 

-  (mo,  TOi,  St)  4— ^^'^(pk).  /*  Stage  1  */ 

-  6  ^{0,1}. 

-  c*  ^  Encrypt(mh,  pk;  r). 

-  b'  St) .  /*  Stage  2  */ 

The  adversary  is  said  to  win  the  game  if  6  =  b',  with  the  advantage  of  the  adversary 
winning  the  game  being  defined  by 

Adv^^^V‘'  =  I  Pr(6=  50-1/21. 

A  scheme  is  said  to  be  IND-atk  secure  if  no  polynomial  time  adversary  A  can  win 
the  above  game  with  non-negligible  advantage  in  the  security  parameter  A.  The  precise 
security  notion  one  obtains  depends  on  the  oracle  access  one  gives  the  adversary  in  its 
different  stages. 

-  If  Al  has  access  to  no  oracles  in  either  stage  then  atk=CPA. 

-  If  Al  has  access  to  a  decryption  oracle  in  stage  one  then  atk=CCAl. 

-  If  Al  has  access  to  a  decryption  oracle  in  both  stages  then  atk=CCA2,  often  now 
denoted  simply  CCA. 

-  If  Al  has  access  to  a  ciphertext  validity  oracle  in  both  stages,  which  on  input  of  a  ci¬ 
phertext  determines  whether  it  would  output  _L  or  not  on  decryption,  then  atk=CVA. 


Lattices:  A  (full-rank)  lattice  is  simply  a  discrete  subgroup  of  K”  generated  by  n 
linear  independent  vectors,  B  =  {bi, . . .  ,b„},  called  a  basis.  Every  lattice  has  an 
infinite  number  of  bases,  with  each  set  of  basis  vectors  being  related  by  a  unimodular 
transformation  matrix.  If  B  is  such  a  set  of  vectors,  we  write 

L  =  C{B)  =  {v  •  B\v  G  Z"} 

to  be  the  resulting  lattice.  An  integer  lattice  is  a  lattice  in  which  all  the  bases  vectors 
have  integer  coordinates. 

For  any  basis  there  is  an  associated  fundamental  parallelepiped  which  can  be  taken 
as  V{B)  =  {X]r=i  1/2)}-  The  volume  of  this  fundamental  par¬ 

allelepiped  is  given  by  the  absolute  value  of  the  determinant  of  the  basis  matrix  A  = 
I  det(i3)|.  We  denote  by  Xoo{L)  the  oo-norm  of  a  shortest  vector  (for  the  oo-norm)  in 
L. 

3  The  Smart- Vercauteren  Variant  of  Gentry’s  Scheme 

We  will  be  examining  variants  of  Gentry’s  SHE  scheme  [15],  in  particular  three  variants 
based  on  the  simplification  of  Smart  and  Vercauteren  [27],  as  optimized  by  Gentry  and 
Halevi  [17].  All  variants  make  use  of  the  same  key  generation  procedure,  parametrized 
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by  a  tuple  of  integers  {N,  t,  /i);  we  assume  there  is  a  function  mapping  security  param¬ 
eters  A  into  tuples  (TV,  t,  /i).  In  practice  N  will  be  a  power  of  two,  t  will  be  greater  than 
2'^  and  p  will  be  a  small  integer,  perhaps  one. 


KeyGen(l^) 

-  Pick  an  irreducible  polynomial  F  G  of  degree  TV. 

-  Pick  a  polynomial  G(X)  €  of  degree  at  most  iV  —  1,  with  coefficients 

bounded  by  t. 

-  resultant(F,  G). 

-  G  is  chosen  such  that  G{X)  has  a  single  unique  root  in  common  with  F{X)  modulo 
d.  Let  a  denote  this  root. 

-  Z{X)  ^  d/G{X)  (mod  F{X)). 

-  pk  ^  (a,  d,  p,  F{X)),  sk  ^  {z{x),G{X),d,  F{X)). 

In  [17]  Gentry  and  Halevi  show  how  to  compute,  for  the  polynomial  F{X)  =  X"^  +  1, 
the  root  a  and  the  polynomial  Z(X)  using  a  method  based  on  the  Fast  Fourier  Trans¬ 
form.  In  particular  they  show  how  this  can  be  done  for  non-prime  values  of  d  (removing 
one  of  the  main  restrictions  in  the  key  generation  method  proposed  in  [27]). 

By  construction,  the  principal  ideal  p  generated  by  G(X)  in  the  number  field  K  = 
T\X]/{F{Xy)  is  equal  to  the  ideal  with  Ok  basis  (d,  X  —  a).  In  particular,  the  ideal 
P  precisely  consists  of  all  elements  in  'L\X]/{F{X))  that  are  zero  when  evaluated  at 
a  modulo  d.  The  Hermite-Normal-Form  of  a  basis  matrix  of  the  lattice  defined  by  the 
coefficient  vectors  of  p  is  given  by 


(  d  0\ 

—a  1 

1 

0  ly 


(1) 


where  the  elements  in  the  first  column  are  reduced  modulo  d. 

To  aid  what  follows  we  write  Z{X)  =  zq  +  zi  ■  X  +  ...  +  Zn-i  •  X^~^  and  define 


doo  =  sup 


[\\g{x)-h{x) 

I  bWIU 


(mod  n^))||, 
•ll/iWIloo 


■.  g,hG  Z[X],deg(p),deg(/i)  <  N 


For  the  choice  /  =  X^  +  1,  we  have  doo  =  N ■  The  key  result  to  understand  how 
the  simplification  of  Smart  and  Vercauteren  to  Gentry’s  scheme  works  is  the  following 
lemma  adapted  from  [27] . 

Lemma  1.  Let  Z{X),  G{X),  a  and  d  be  as  defined  in  the  above  key  generation  proce¬ 
dure.  If  C{X)  G  Z[X]/(F(X))  is  a  polynomial  with  ||G(2f)  ||oo  <  U  and  set  c  =  G(a) 
(mod  d),  then 


C{X)  =  c- 


c-  z{xy 
d 


■G{X) 


(mod  F{X)) 


for 

U=  _ ^ _ 

2-6^-\\Z{X)\\^ 
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Proof.  By  definition  of  c,  we  have  that  c  —  C{X)  is  contained  in  the  principal  ideal 
generated  by  G(X)  and  thus  there  exists  &  q{X)  G  Z[X]/(F(X))  such  that  c—G(X)  = 
q{X)G{X).  Using  Z{X)  =  d/G{X)  (mod  F{X)),  we  can  write 

q^X)  = 

Since  q{X)  has  integer  coefficients,  we  can  recover  it  by  rounding  the  coefficients  of 
the  first  term  if  the  coefficients  of  the  second  term  are  strictly  bounded  by  1/2.  This 
shows  that  G{X)  can  be  recovered  from  c  for  |jG(X)  |joo  <  d/{2  ■  S^o  •  ll^(-^)lloo)- 

Note  that  the  above  lemma  essentially  states  that  if  ||G(X)|joo  <  U,  then  C{X)  is 
determined  uniquely  by  its  evaluation  in  a  modulo  d.  Recall  that  any  polynomial  H (X) 
of  degree  less  than  N,  whose  coefficient  vector  is  in  the  lattice  defined  in  equation  (1), 
satisfies  H{a)  =  0  (mod  d).  Therefore,  if  H{X)  0,  the  lemma  implies,  for  such  an 
iT,  that  ||iJ(X)||oo  >  U,  and  thus  we  conclude  that  U  <  Xoo{L).  Since  the  coefficient 
vector  of  G{X)  is  clearly  in  the  lattice  L,  we  conclude  that 

U<\oo{L)  <  ||G(X)|U. 

Although  Lemma  1  provides  the  maximum  value  of  U  for  which  ciphertexts  are  de- 
cryptable,  we  will  only  allow  a  quarter  of  this  maximum  value,  i.e.  T  =  [//4.  As  such 
we  are  guaranteed  that  T  <  Aoo(G)/4.  We  note  that  T  defines  the  size  of  the  circuit 
that  the  somewhat  homomorphic  encryption  scheme  can  deal  with.  Our  choice  of  T  will 
become  clear  in  Section  5. 

Using  the  above  key  generation  method  we  can  define  three  variants  of  the  Smart- 
Vercauteren  variant  of  Gentry’s  scheme.  The  first  variant  is  the  one  used  in  the  Gen¬ 
try/Halevi  implementation  of  [17],  the  second  is  the  general  variant  proposed  by  Smart 
and  Vercauteren,  whereas  the  third  divides  the  decryption  procedure  into  two  steps  and 
provides  a  ciphertext  validity  check.  In  later  sections  we  shall  show  that  the  first  variant 
is  not  IND-CCAl  secure,  and  by  extension  neither  is  the  second  variant.  However,  we 
will  show  that  the  third  variant  is  indeed  IND-CCAl.  We  will  then  show  that  the  third 
variant  is  not  IND-CVA  secure. 

Each  of  the  following  variants  is  only  a  somewhat  homomorphic  scheme,  extending 
it  to  a  fully  homomorphic  scheme  can  be  performed  using  methods  of  [15,16,17]. 

Gentry-Halevi  Variant:  The  plaintext  space  is  the  field  F2.  The  above  Key  Gen 
algorithm  is  modified  to  only  output  keys  for  which  d  =  1  (mod  2).  This  implies  that 
at  least  one  coefficient  of  Z{X),  say  Zi^  will  be  odd.  We  replace  Z(X)  in  the  private 
key  with  Zig,  and  can  drop  the  values  G(X)  and  F{X)  entirely  from  the  private  key. 
Encryption  and  decryption  can  now  be  defined  via  the  functions: 

Encrypt(m,  pk;  r)  Decrypt(c,  sk) 

-  R{X)  ^  Z[X]  s.t.  ||i?(X)|joo  <  p.  -  [c-  z^g]d  (mod  2) 

-  gIx)  ^  m  +  2  ■  R{X).  -  Return  m. 

-  c  ^  [G(a)]d. 

-  Return  c. 
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Full- Space  Smart-Vercauteren:  In  this  variant  the  plaintext  space  is  the  algebra 
F2[X]/(_F(X)),  where  messages  are  given  by  binary  polynomials  of  degree  less  than 
N.  As  such  we  call  this  the  Full-Space  Smart-Vercauteren  system  as  the  plaintext  space 
is  the  full  set  of  binary  polynomials,  with  multiplication  and  addition  defined  modulo 
F{X).  We  modify  the  above  key  generation  algorithm  so  that  it  only  outputs  keys  for 
which  the  polynomial  G{X)  satisifies  G{X)  =  1  (mod  2).  This  results  in  algorithms 
defined  by: 

Encrypt(M(X),  pk;  r)  Decrypt(c,  sk) 

-  R{X)  ^  ZlX]  s.t.  ||i?(X)||oo  <M-  -  G(X)  ^  c-  lc-Z(X)/d]. 

-  G(X)  ^  M(X)  +  2  •  R{X).  -  M{X)  ^  G{X)  (mod  2). 

-  [C{a)]d-  -  Return  M(X). 

-  Return  c. 

That  decryption  works,  assuming  the  input  ciphertext  corresponds  to  the  evaluation  of 
a  polynomial  with  coefficients  bounded  by  T,  follows  from  Lemma  1  and  the  fact  that 
G{X)  =  1  (mod  2). 

CCSHE:  This  is  our  ciphertext-checking  SHE  scheme  (or  ccSHE  scheme  for  short). 
This  is  exactly  like  the  above  Eull-Space  Smart-Vercauteren  variant  in  terms  of  key 
generation,  but  we  now  check  the  ciphertext  before  we  output  the  message.  Thus  en¬ 
cryption/decryption  become; 

Encrypt(M(A'),  pk;  r)  Decrypt(c,  sk) 

-  R{X)  ^  Z[X]  s.t.  |!i?(X)||oo  <  /r.  -  G{X)  ^  c  -  [c-  Z{X)/d]  ■  G{X). 

-  gIx)  ^  M{X)  +  2  •  R{X).  -  G(X)  ^  G(X)  (mod  F(X)) 

-  [C{a)]d.  -  F  ^  [G{a)]d. 

-  Return  c.  -  \fc'  ^  c  or  ||C'(A')||oo  >  T  return  _L. 

-  M{X)  ^  G{X)  (mod  2). 

-  Return  M{X). 

4  CCAl  attack  on  the  Gentry-Halevi  Variant 

We  construct  an  IND-CCAl  attacker  against  the  above  Gentry-Halevi  variant.  Let  ^  be 
the  secret  key,  i.e.  the  specific  odd  coefficient  of  Z{X)  chosen  by  the  decryptor.  Note 
that  we  can  assume  2  G  [0,  d),  since  decryption  in  the  Gentry-Halevi  variant  works  for 
any  secret  key  z  +  k  ■  d  with  fc  G  Z.  We  assume  the  attacker  has  access  to  a  decryption 
oracle  to  which  it  can  make  polynomially  many  queries,  Od{c).  On  each  query  the 
oracle  returns  the  value  of  [c  •  z\d  (mod  2). 

In  Algorithm  1  we  present  pseudo-code  to  describe  how  the  attack  proceeds.  We 
start  with  an  interval  [L, . . . ,  C/]  which  is  known  to  contain  the  secret  key  z  and  in  each 
iteration  we  split  the  interval  into  two  halves  determined  by  a  specific  ciphertext  c. 
The  choice  of  which  sub-interval  to  take  next  depends  on  whether  k  multiples  of  d  are 
sufficient  to  reduce  c  •  z  into  the  range  [—d/2, . . . ,  d/2)  or  whether  k  +  I  multiples  are 
required. 

Analysis:  The  core  idea  of  the  algorithm  is  simple:  in  each  step  we  choose  a  “cipher- 
text”  c  such  that  the  length  of  the  interval  for  the  quantity  c  •  z  is  bounded  by  d.  Since  in 
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Algorithm  1:  CCAl  attack  on  the  Gentry-Halevi  Variant 

L‘^0,U  ^  d-1 
while  [/  —  L  >  1  do 

c  -  Vd/{U  -  L)\ 
b^Ooic) 
g  <—  (c  +  6)  mod  2 
k  ^  [Lc/d  +  1/2J 
B  ^  {k  +  l/2)d/c 
if  (fc  mod  2  =  q)  then 
U^[Bi 
else 

L^\B] 

return  L 


each  step,  z  G  \L,  U],  we  need  to  take  c  =  \_d/{U  —  L)\ .  As  such  it  is  easy  to  see  that 
c{U  -L)<d. 

To  reduce  cL,  we  need  to  subtract  kd  such  that  —d/2  <  cL  —  kd  <  d/2,  which 
shows  that  k  =  [Lc/d  +  1/2J.  Furthermore,  since  the  length  of  the  interval  for  c  •  z 
is  bounded  by  d,  there  will  be  exactly  one  number  of  the  form  d/2  +  id  in  [cL,  cU], 
namely  d/2  +  kd.  This  means  that  there  is  exactly  one  boundary  B  =  {k  +  l/2)c?/c  in 
the  interval  for  z. 

Define  q  as  the  unique  integer  such  that  —d/2  <  cz  —  qd  <  d/2,  then  since  the 
length  of  the  interval  for  c  •  z  is  bounded  by  d,  we  either  have  q  =  k  or  q=k+i.. 
To  distinguish  between  the  two  cases,  we  simply  look  at  the  output  of  the  decryption 
oracle:  recall  that  the  oracle  outputs  [c-  z]d  (mod  2),  i.e.  the  bit  output  by  the  oracle  is 

b  =  c  -  z  —  q  ■  d  (mod  2)  =  (c  +  q)  (mod  2) . 

Therefore,  q  =  {b  +  c)  (mod  2)  which  allows  us  to  choose  between  the  cases  k  and 
fc+  1.  If  g  =  k  (mod  2),  then  z  lies  in  the  first  part  [L,  [-BJ],  whereas  in  the  other  case, 
z  lies  in  the  second  part  [[i?] ,  [/]. 

Having  proved  correctness  we  now  estimate  the  running  time.  The  behaviour  of  the 
algorithm  is  easily  seen  to  be  as  follows:  in  each  step,  we  obtain  a  boundary  B  in  the 
interval  [L,  U]  and  the  next  interval  becomes  either  [L,  [BJ]  or  [[i?] ,  U].  Since  B  can 
be  considered  random  in  [L,U]  as  well  as  the  choice  of  the  interval,  this  shows  that  in 
each  step,  the  size  of  the  interval  decreases  by  a  factor  2  on  average.  In  conclusion  we 
deduce  that  recovering  the  secret  key  will  require  0(log  d)  calls  to  the  oracle. 

The  above  attack  is  highly  efficient  in  practice  and  recovers  keys  in  a  matter  of 
seconds  for  all  parameter  sizes  in  [17]. 

5  ccSHE  is  PA-1 

In  this  section  we  prove  that  the  ccSHE  encryption  scheme  given  earlier  is  PA-1,  as¬ 
suming  a  lattice  knowledge  assumption  holds.  We  first  recap  on  the  definition  of  PA-1 
in  the  standard  model,  and  then  we  introduce  our  lattice  knowledge  assumption.  Once 
this  is  done  we  present  the  proof. 
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Plaintext  Awareness  -  PA-1:  The  original  intuition  for  the  introduction  of  plain¬ 
text  awareness  was  as  follows  -  if  an  adversary  knows  the  plaintext  corresponding  to 
every  ciphertext  it  produces,  then  the  adversary  has  no  need  for  a  decryption  oracle  and 
hence,  PAh-IND-CPA  must  imply  IND-CCA.  Unfortunately,  there  are  subtleties  in  the 
definition  for  plaintext  awareness,  leading  to  three  definitions,  PA-0,  PA-1  and  PA-2. 
However,  after  suitably  formalizing  the  definitions,  PA-x  plus  IND-CPA  implies  IND- 
CCAx,  for  X  =  1  and  2.  In  our  context  we  are  only  interested  in  IND-CCA  1  security,  so 
we  will  only  discuss  the  notion  of  PA-1  in  this  paper. 

Before  formalizing  PA-1  it  is  worth  outlining  some  of  the  terminology.  We  have  a 
polynomial  time  adversary  A  called  a  ciphertext  creator,  that  takes  as  input  a  public  key 
and  can  query  ciphertexts  to  an  oracle.  An  algorithm  A*  is  called  a  successful  extractor 
for  A  if  it  can  provide  responses  to  A  which  are  computationally  indistinguishable  from 
those  provided  by  a  decryption  oracle.  In  particular  a  scheme  is  said  to  be  PA-1  if  there 
exists  a  successful  extractor  for  any  ciphertext  creator  that  makes  a  polynomial  number 
of  queries.  The  extractor  gets  the  same  public  key  as  A  and  also  has  access  to  the 
random  coins  used  by  algorithm  A.  Following  [4]  we  define  PA-1  formally  as  follows: 


Definition  1  (PAl).  Let  £  be  a  public  key  encryption  scheme  and  A  be  an  algorithm 
with  access  to  an  oracle  O  taking  input  pk  and  returning  a  string.  Let  T>  be  an  algorithm 
that  takes  as  input  a  string  and  returns  a  single  bit  and  let  A*  be  an  algorithm  which 
takes  as  input  a  string  and  some  state  information  and  returns  either  a  string  or  the 
symbol  _L,  plus  a  new  state.  We  call  A  a  ciphertext  creator.  A*  a  PA- 1 -extractor,  and  T) 
a  distinguishes  For  security  parameter  A  we  define  the  (distinguishing  and  extracting) 
experiments  in  Figure  1,  and  then  define  the  PA-1  advantage  to  be 


Adv 


PA-l 

S,A,'D,A 


.(A) 


Pr(Exp|’;^-p‘^(A)  =  1)  -  Pr(Exp|’^^;P^.  (A)  =  1) 


We  say  A*  is  a  successful  PA- 1 -extractor  for  A  if  for  every  polynomial  time  distin- 
guisher  the  above  advantage  is  negligible. 


Exp|'.l-.l4*(A): 

-  (pk, sk)  ^  KeyGen(U). 

-  Choose  coins  coins[A]  (resp.  coins[A*])  for  A  (resp. 

A*)- 

-  St  ^  (pk,  coins[A]). 

-  X  ^  Al‘^(pk;  coins[A]),  replying  to  the  oracle  queries 
0(c)  as  follows: 

•  (m,  St)  <— A*(c,  St;  coins[A*]). 

•  Return  m  to  A 

-  d^V{x). 

-  Return  d. 

Fig.  1.  Experiments  Exp^  and 

Note,  in  experiment  Exp|’^“p‘^(A)  the  algorithm  Al’s  oracle  queries  are  responded 
to  by  the  genuine  decryption  algorithm,  whereas  in  (A)  the  queries  are  re- 
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-  d^V(x). 

-  Return  d. 
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sponded  to  by  the  PA- 1 -extractor.  If  A*  did  not  receive  the  coins  coins[^]  from  A  then 
it  would  be  functionally  equivalent  to  the  real  decryption  oracle,  thus  the  fact  that  A* 
gets  access  to  the  coins  in  the  second  experiment  is  crucial.  Also  note  that  the  distin- 
guisher  acts  independently  of  A* ,  and  thus  this  is  strictly  stronger  than  having  A  decide 
as  to  whether  it  is  interacting  with  an  extractor  or  a  real  decryption  oracle. 

The  intuition  is  that  A*  acts  as  the  unknowing  subconscious  of  A,  and  is  able  to 
extract  knowledge  about  ^’s  queries  to  its  oracle.  That  A*  can  obtain  the  underlying 
message  captures  the  notion  that  A  needs  to  know  the  message  before  it  can  output  a 
valid  ciphertext. 

The  following  lemma  is  taken  from  [4]  and  will  be  used  in  the  proof  of  the  main 
theorem. 

Lemma  2.  Let  £  be  a  public  key  encryption  scheme.  Let  A  be  a  polynomial-time  ci¬ 
phertext  creator  attacking  £,7)  a  polynomial-time  distinguisher,  and  A*  a  polynomial¬ 
time  PA- 1 -extractor.  Let  DecOK  denote  the  event  that  all  A*’s  answers  to  A’s  queries 
are  correct  in  experiment  (A).  Then, 

Pr(Exp|’,^-i^^.(A)  =  1)  >  Pr(Expf,l-i^''(A)  =  1)  -  Pr(D^^) 

Lattice  Knowledge  Assumption:  Our  knowledge  assumption  can  be  stated  in¬ 
formally  as  follows:  suppose  there  is  a  (probabilistic)  algorithm  C  which  takes  as  input 
a  lattice  basis  of  a  lattice  L  and  outputs  a  vector  c  suitably  close  to  a  lattice  point  p,  i.e. 
closer  than  e  •  Xoo{L)  in  the  oo-norm  for  a  fixed  e  G  (0, 1/2).  Then  there  is  an  algorithm 
C*  which  on  input  of  c  and  the  random  coins  of  C  outputs  a  close  lattice  vector  p,  i.e. 
one  for  which  ||c  —  p||oo  <  e  •  Xao{L)-  Note  that  the  algorithm  C*  can  therefore  act  as  a 
e-CVP-solver  for  c  in  the  oo-norm,  given  the  coins  coins[C].  Again  as  in  the  PA-1  defi¬ 
nition  it  is  perhaps  useful  to  think  of  C*  as  the  “subconscious”  of  C,  since  C  is  capable 
of  outputting  a  vector  close  to  the  lattice  it  must  have  known  the  close  lattice  vector  in 
the  first  place.  Formally  we  have: 

Definition  2  (LK-e).  Let  e  be  a  fixed  constant  in  the  interval  (0, 1/2).  Let  Q  denote  an 
algorithm  which  on  input  of  a  security  parameter  1^  outputs  a  lattice  L  given  by  a  basis 
B  of  dimension  n  =  n(A)  and  volume  A  =  4\(A).  Let  C  be  an  algorithm  that  takes 
a  lattice  basis  B  as  input,  and  has  access  to  an  oracle  O,  and  returns  nothing.  Let  C* 
denote  an  algorithm  which  takes  as  input  a  vector  c  G  M"  and  some  state  information, 
and  returns  another  vector  p  G  M”  plus  a  new  state.  Consider  the  experiment  in  Figure 
2.  The  LK-e  advantage  of  C  relative  to  C*  is  defined  by 

Adv^^)a.(A)  =  Pr[Exp^^)^.(A)  =  1]. 

We  say  Q  satisfies  the  LK-e  assumption,  for  a  fixed  e,  if  for  every  polynomial  time  C 
there  exists  a  polynomial  time  C*  such  that  Advg^“^»  (A)  is  a  negligible  function  of  X. 

The  algorithm  C  is  called  an  LK-e  adversary  and  C*  a  LK-e  extractor.  We  now 
discuss  this  assumption  in  more  detail.  Notice,  that  for  all  lattices,  if  e  <  1/4  then  the 
probability  of  a  random  vector  being  within  e  •  Aoo(T)  of  the  lattice  is  bounded  from 
above  by  1/2",  and  for  lattices  which  are  not  highly  orthogonal  this  is  likely  to  hold  for 
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Exp§^:S.(A): 

- 

-  Choose  coins  coins[C]  (resp.  coins[C*])  forC  (resp.  C*). 

-  St  -  (B,  coins[C]). 

-  Run  {B]  coins[C])  until  it  halts,  replying  to  the  oracle  queries  0{c)  as  follows: 

•  (p,  St)  ^  C*(c,  St;  coins[C*]). 

•  If  p  ^  C{B),  return  1. 

•  If  Up  —  c||oo  >  e  •  \oa{L),  return  1. 

•  Return  p  to  C. 

-  Return  0. 


Fig.  2.  Experiment  Expg^'^,  (A) 

all  e  up  to  1/2.  Our  choice  of  T  in  the  ccSHE  scheme  as  17/4  is  to  guarantee  that  our 
lattice  knowledge  assumption  is  applied  with  e  =  1/4,  and  hence  is  more  likely  to  hold. 

If  the  query  c  which  C  asks  of  its  oracle  is  within  e  •  Aoo  {L)  of  a  lattice  point  then 
we  require  that  C*  finds  such  a  close  lattice  point.  If  it  does  not  then  the  experiment  will 
output  1;  and  the  assumption  is  that  this  happens  with  negligible  probability. 

Notice  that  if  C  asks  its  oracle  a  query  of  a  vector  which  is  not  within  e  •  Aoo  {L)  of  a 
lattice  point  then  the  algorithm  C*  may  do  whatever  it  wants.  However,  to  determine  this 
condition  within  the  experiment  we  require  that  the  environment  running  the  experiment 
is  all  powerful,  in  particular,  that  it  can  compute  Aoo  {L)  and  decide  whether  a  vector 
is  close  enough  to  the  lattice.  Thus  our  experiment,  but  not  algorithms  C  and  C*,  is 
assumed  to  be  information  theoretic.  This  might  seem  strange  at  first  sight  but  is  akin 
to  a  similarly  powerful  game  experiment  in  the  strong  security  model  for  certificateless 
encryption  [1],  or  the  definition  of  insider  unforgeable  signcryption  in  [3]. 

For  certain  input  bases,  e.g.  reduced  ones  or  ones  of  small  dimension,  an  algorithm 
C*  can  be  constructed  by  standard  algorithms  to  solve  the  CVP  problem.  This  does  not 
contradict  our  assumption,  since  C  would  also  be  able  to  apply  such  an  algorithm  and 
hence  “know”  the  close  lattice  point.  Our  assumption  is  that  when  this  is  not  true,  the 
only  way  C  could  generate  a  close  lattice  point  (for  small  enough  values  of  e)  is  by 
computing  x  e  Z"  and  perturbing  the  vector  x  ■  B. 

Main  Theorem: 

Theorem  1.  Let  Q  denote  the  lattice  basis  generator  induced  from  the  KeyGen  algo¬ 
rithm  of  the  ccSHE  scheme,  i.e.  for  a  given  security  parameter  1^,  run  KeyGen(l^) 
to  obtain  pk  =  (a,d,  p,  F{X))  and  sk  =  (Z(X),G{X),d,  F{X)),  and  generate  the 
lattice  basis  B  as  in  equation  (1).  Then,  if  Q  satisfies  the  LK-e  assumption  for  e  =  1/4 
then  the  ccSHE  scheme  is  PA-1. 

Proof  Let  7l  be  a  polynomial-time  ciphertext  creator  attacking  the  ccSHE  scheme, 
then  we  show  how  to  construct  a  polynomial  time  PA  1 -extractor  A* .  The  creator  A 
takes  as  input  the  public  key  pk  =  {a,d,  p,  F{X))  and  random  coins  coins[7l]  and 
returns  an  integer  as  the  candidate  ciphertext.  To  define  A*,  we  will  exploit  A  to  build  a 
polynomial-time  LK-e  adversary  C  attacking  the  generator  Q.  By  the  LK-e  assumption 
there  exists  a  polynomial-time  LK-e  extractor  C*,  that  will  serve  as  the  main  building 
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block  for  the  PA  1 -extractor  A*.  The  description  of  the  LK-e  adversary  C  is  given  in 
Figure  3  and  the  description  of  the  PA- 1 -extractor  A*  is  given  in  Figure  4. 


LK-e  adversary  C‘^{B;  coins[C]) 

-  Let  d  =  5(0] [0]  and  a  =  — _B[1][0] 

-  Parse  coins [C]  as  p| |_F'(X)| |coins[^] 

-  Run  A  on  input  (a,  d,  p,  F{X))  and  coins  coins[^]  until  it  halts,  replying  to  its  oracle 
queries  as  follows: 

•  If  ^  makes  a  query  with  input  c,  then 

•  Submit  (c,  0,  0, . . . ,  0)  to  O  and  let  p  denote  the  response 

•  Let  c  =  (c,  0, . . . ,  0)  —  p,  and  C{X)  =  CiA* 

•  Let  c'  =  [C{a)]d 

•  Ifc' /  cor  ||C'(X)||oo  >  T,  thenM(X)  ^_L,  elseM(A)  ^  C(X)  (mod  2) 

•  Return  M  (X)  to  A  as  the  oracle  response. 

-  Halt 


Fig.  3.  LK-e  adversary 


PA- 1 -extractor  A*  (c,  St[.4*];  coins[.4*]) 

-  If  St[.4*]  is  initial  state  then 

•  parse  coin s[.4*]  as  (a,  d,  p,  F’(X))||coins[.4] 

•  St[C'*]  ^  (a,d, /i,  F(X))||coins[.4] 

•  else  parse  coins[.4*]  as  (a,  d,  p,  F(X))|  |St[C*] 

-  (p,  St[C*])  ^  C*((c,  0, . . . ,  0),  St[C*];  coins[.4*]) 

-  Let  c  =  (c,  0, . . . ,  0)  -  p,  and  C{X)  = 

-  Let  c'  =  [C{a)]d 

-  Ifc' 7^  cor  ||C'(X)||oo  >  r,thenM(X)  ^_L,elseM(X)  ^  C(X)  (mod  2) 

-  St[.4*l  ^  (a,d,/r,F(X))||St[C*] 

-  Return  (M(X),  St[.4*]). 


Fig.  4.  PA- 1 -extractor 


We  first  show  that  A*  is  a  successful  PA- 1 -extractor  for  A.  In  particular,  let  DecOK 
denote  the  event  that  all  ^*’s  answers  to  ^’s  queries  are  correct  in  (A), 

then  we  have  that  Pr(DecOK)  <  Advg^“^.  (A). 

We  first  consider  the  case  that  c  is  a  valid  ciphertext,  i.e.  a  ciphertext  such  that 
Decrypt(c,  sk)  y^_L,  then  by  definition  of  Decrypt  in  the  ccSHE  scheme  there  exists 
a  C{x)  such  that  c  =  [C{a)]d  and  |jC'(X)||oo  <  T.  Let  p'  be  the  coefficient  vector 
of  c  —  C'(X),  then  by  definition  of  c,  we  have  that  p'  is  a  lattice  vector  that  is  within 
distance  T  of  the  vector  (c,  0, . . . ,  0).  Furthermore,  since  T  <  Xoo{L) /4,  the  vector  p' 
is  the  unique  vector  with  this  property.  Let  p  be  the  vector  returned  by  C*  and  assume 
that  p  passes  the  test  ||  (c,  0, . . . ,  0)  —  p||oo  <  T,  then  we  conclude  that  p  =  p'.  This 
shows  that  if  c  is  a  valid  ciphertext,  it  will  be  decrypted  correctly  by  A* . 
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When  c  is  an  invalid  ciphertext  then  the  real  decryption  oracle  will  always  output 
±,  and  it  can  be  easily  seen  that  our  PA-1  extractor  A*  will  also  output  _L.  Thus  in 
the  case  of  an  invalid  ciphertext  the  adversary  A  cannot  tell  the  two  oracles  apart.  The 
theorem  now  follows  from  combining  the  inequality  Pr(DecOK)  <  Advg^"^,  (A)  with 
Lemma  2  as  follows: 

Adv|’,lip,^.(A)  = 

<  Advg^i^.  (A) . 

6  ccSHE  is  not  secure  in  the  presence  of  a  CVA  attack 

We  now  show  that  our  ccSHE  scheme  is  not  secure  when  the  attacker,  after  being  given 
the  target  ciphertext  c*,  is  given  access  to  an  oracle  Ocya{c)  which  returns  1  if  c  is 
a  valid  ciphertext  (i.e.  the  decryption  algorithm  would  output  a  message),  and  which 
returns  0  if  it  is  invalid  (i.e.  the  decryption  algorithm  would  output  _L).  Such  an  “oracle” 
can  often  be  obtained  in  the  real  world  by  the  attacker  observing  the  behaviour  of  a  party 
who  is  fed  ciphertexts  of  the  attackers  choosing.  Since  a  CVA  attack  is  strictly  weaker 
than  a  IND-CCA2  attack  it  is  an  interesting  open  (and  practical)  question  as  to  whether 
an  FHE  scheme  can  be  CVA  secure. 

We  now  show  that  the  ccSHE  scheme  is  not  CVA  secure,  by  presenting  a  relatively 
trivial  attack:  Suppose  the  adversary  is  given  a  target  ciphertext  c*  associated  with  a 
hidden  message  m* .  Using  the  method  in  Algorithm  2  it  is  easy  to  determine  the  mes¬ 
sage  using  access  to  Ocwa{c).  Basically,  we  add  on  multiples  of  a*  to  the  ciphertext 
until  it  does  not  decrypt;  this  allows  us  to  perform  a  binary  search  on  the  z-th  coefficient 
of  C{X),  since  we  know  the  bound  T  on  the  coefficients  of  C{X). 


=  l)-Pr(Exp|’,l-i^XA)  =  l) 

=  1)  -  PriExp^XvW  =  1)  +  Pr(Di^) 


Algorithm  2:  CVA  attack  on  ccSHE 

C{X)  ^  0 

for  i  from  0  upto  A  —  1  do 

L< - T  +1,U  -1 

while  [/  /  L  do 

\{U  +  L)/2']. 

[-C*  +  {M  +  T-l)-a%. 
if  CcvA  (c)  =  1  then 
M. 
else 

U  ^M-1. 

C{X)^C{X)  +  U-X\ 
m*  ^  C{X)  (mod  2) 

return  m* 


If  Ci  is  the  zth  coefficient  of  the  actual  C{X)  underlying  the  target  ciphertext  c*, 
then  the  zth  coefficient  of  the  polynomial  underlying  ciphertext  c  being  passed  to  the 
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OcYA  oracle  is  given  by  M  +  T  —  1  —  q.  When  M  <  Ci  this  coefficient  is  less  than  T 
and  so  the  oracle  will  return  1,  however  when  M  >  Ci  the  coefficient  is  greater  than  or 
equal  T  and  hence  the  oracle  will  return  0.  Thus  we  can  divide  the  interval  for  Ci  in  two 
depending  on  the  outcome  of  the  test. 

It  is  obvious  that  the  complexity  of  the  attack  is  0{N  ■  log2  T).  Since,  for  the  rec¬ 
ommended  parameters  in  the  key  generation  method,  N  and  log2  T  are  polynomial 
functions  of  the  security  parameter,  we  obtain  a  polynomial  time  attack. 

7  CCA2  Somewhat  Homomorphic  Encryption? 

In  this  section  we  deal  with  an  additional  issue  related  to  CCA  security  of  somewhat 
homomorphic  encryption  schemes.  Consider  the  following  scenario:  three  parties  wish 
to  use  SHE  to  compute  some  information  about  some  data  they  posses.  Suppose  the 
three  pieces  of  data  are  rrii,  m2  and  m3.  The  parties  encrypt  these  messages  with  the 
SHE  scheme  to  obtain  ciphertexts  ci,  C2  and  C3.  These  are  then  passed  to  a  third  party 
who  computes,  via  the  SHE  properties,  the  required  function.  The  resulting  ciphertext 
is  passed  to  an  “Opener”  who  then  decrypts  the  output  and  passes  the  computed  value 
back  to  the  three  parties.  As  such  we  are  using  SHE  to  perform  a  form  of  multi-party 
computation,  using  SHE  to  perform  the  computation  and  a  special  third  party,  called  an 
Opener,  to  produce  the  final  result. 

Consider  the  above  scenario  in  which  the  messages  lie  in  {0, 1}  and  the  function  to 
be  computed  is  the  majority  function.  Now  assume  that  the  third  party  and  the  protocol 
are  not  synchronous.  In  such  a  situation  the  third  party  may  be  able  to  make  a  copy 
of  the  first  party’s  ciphertext  and  submit  it  as  his  own.  In  such  a  situation  the  third 
party  forces  the  above  protocol  to  produce  an  output  equal  to  the  first  party’s  input;  thus 
security  of  the  first  party’s  input  is  lost.  This  example  may  seem  a  little  contrived  but 
it  is,  in  essence,  the  basis  of  the  recent  attack  by  Smyth  and  Cottier  [28]  on  the  Helios 
voting  system;  recall  Helios  is  a  voting  system  based  on  homomorphic  (but  not  fully 
homomorphic)  encryption. 

An  obvious  defence  against  the  above  attack  would  be  to  disallow  input  ciphertexts 
from  one  party,  which  are  identical  to  another  party’s.  However,  this  does  not  preclude  a 
party  from  using  malleability  of  the  underlying  SHE  scheme  to  produce  a  ciphertext  C3, 
such  that  C3  ^  Cl,  but  Decrypt(ci,  sk)  =  Decrypt(c3,  sk).  Hence,  we  need  to  preclude 
(at  least)  forms  of  benign  malleability,  but  to  do  so  would  contradict  the  fact  that  we 
require  a  fully  homomorphic  encryption  scheme. 

To  get  around  this  problem  we  introduce  the  notion  of  CCA-embeddable  homomor¬ 
phic  encryption.  Informally  this  is  an  IND-CCA2  public  key  encryption  scheme  S,  for 
which  given  a  ciphertext  c  one  can  publicly  extract  an  equivalent  ciphertext  c'  for  an 
IND-CPA  homomorphic  encryption  scheme  S' .  More  formally 

Definition  3.  An  IND-CPA  homomorphic  (possibly  fully  homomorphic)  public  key  en¬ 
cryption  scheme  S'  =  (KeyGen^,  Encrypt^  Decrypt^)  is  said  to  be  CCA-embeddable 
if  there  is  an  IND-CCA  encryption  scheme  S  =  (KeyGen,  Encrypt,  Decrypt)  and  an 
algorithm  Extract  such  that 

-  KeyGen  produces  two  secret  keys  sk\  sk^^  where  sk^  is  in  the  key  space  of  S' . 
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-  Decry pt^ (Extract(Encrypt(m,pfc),sk"),sk^)  =  m. 

-  The  ciphertext  validity  check  for  S  is  computable  using  only  the  secret  key  sk". 

-  CCAl  security  of  E'  is  not  compromised  by  leakage  of  sV!' . 

As  a  simple  example,  for  standard  homomorphic  encryption,  is  that  ElGamal  is  CCA- 
embeddable  into  the  Cramer-Shoup  encryption  scheme  [10].  We  note  that  this  notion  of 
CCA-embeddable  encryption  was  independently  arrived  at  by  [7]  for  standard  (singu¬ 
larly)  homomorphic  encryption  in  the  context  of  providing  a  defence  against  the  earlier 
mentioned  attack  on  Helios.  See  [7]  for  a  more  complete  discussion  of  the  concept. 

As  a  proof  of  concept  for  somewhat  homomorphic  encryption  schemes  we  show 
that,  in  the  random  oracle  model,  the  somewhat  homomorphic  encryption  schemes 
considered  in  this  paper  are  CCA-embeddable.  We  do  this  by  utilizing  the  Naor-Yung 
paradigm  [22]  for  constructing  IND-CCA  encryption  schemes,  and  the  zero-knowledge 
proofs  of  knowledge  for  semi -homomorphic  schemes  considered  in  [6].  Note  that  our 
construction  is  inefficient;  we  leave  it  as  an  open  problem  as  to  whether  more  specific 
constructions  can  be  provided  for  the  specific  SHE  schemes  considered  in  this  paper. 

Construction:  Given  an  SHE  scheme  £'  =  (KeyGen^,  Encrypt^,  Decrypt^)  we  con¬ 
struct  the  scheme  E  =  (KeyGen,  Encrypt,  Decrypt)  into  which  £'  embeds  as  follows, 
where  NIZKPoK  =  (Prove,  Verify)  is  a  suitable  non-malleable  non-interactive  zero- 
knowledge  proof  of  knowledge  of  equality  of  two  plaintexts: 


KeyGen(l^) 

-  (pk),sk'^)  ^  KeyGen'(l^). 

-  (pk2,sk2)^^KeyGen'(l^). 

-  pk<-  (pki,pk2),sk<-  (sk'i,sk2). 

-  Return  (pk,  sk). 

Extract(c) 

-  Parse  c  as  (c'l,  C2,  S). 

-  Return  c'l . 


Encrypt(m,  pk;  r) 

-  c'l  Encrypt'(m,  pk);  r^). 

-  c'2  ^  Encrypt'(m,  pk);r)). 

-  27  ^  Prove(ci,  C2;  TO,  r),  r)). 

-  (c),  c'2,  27). 

-  Return  c. 

Decrypt(c,  sk) 

-  Parse  c  as  (c),  c),  27). 

-  If  Verify(27,  c),  df)  =  0  return  _L. 

-  TO  ^  Decrypt'(c),  sk)). 

-  Return  to. 


All  that  remains  is  to  describe  how  to  instantiate  the  NIZKPoK.  We  do  this  using 
the  Eiat-Shamir  heuristic  applied  to  the  Sigma-protocol  in  Eigure  5.  The  protocol  is 
derived  from  the  same  principles  as  those  in  [6],  and  security  (completeness,  soundness 
and  zero-knowledge)  can  be  proved  in  an  almost  identical  way  to  that  in  [6].  The  main 
difference  being  that  we  need  an  adjustment  to  be  made  to  the  response  part  of  the 
protocol  to  deal  with  the  message  space  being  defined  modulo  two.  We  give  the  Sigma 
protocol  in  the  simplified  case  of  application  to  the  Gentry-Halevi  variant,  where  the 
message  space  is  equal  to  {0, 1}.  Generalising  the  protocol  to  the  Eull  Space  Smart- 
Vercauteren  variant  requires  a  more  complex  “adjustment”  to  the  values  of  ti  and  t2  in 
the  protocol.  Notice  that  the  soundness  error  in  the  following  protocol  is  only  1/2,  thus 
we  need  to  repeat  the  protocol  a  number  of  times  to  obtain  negligible  soundness  error 
which  leads  to  a  loss  of  efficiency. 
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Prover 


Verifier 


Cl  =  Encrypt'(m,  pk'j^;  ri) 

C2  =  Encrypt'(m,  pkj;  rj)  ci,  C2 

2/ ^{0,1} 

ai  ^  Encrypt'(j/,  pk^;si) 

02^  Encrypt' (j/,pk';s^) 

.  g  e^{0,l} 

z  <—  y  ©  e  •  m 

2i  <—  Si  +  e  •  ri  +  e  •  1/  •  m 

^2  <—  S2  +  e  •  r2  +  e  •  1/  •  m  ^2  Accept  if  and  only  if 

Encrypt'(2,pfci;fi)  =  oi  +  e  •  ci 
Encrypt' (2,  pfc2 ;  ^2)  =  02+6-02. 


Fig.  5.  ZKPoK  of  equality  of  two  plaintexts 
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Abstract.  We  show  that  homomorphic  evaluation  of  (wide  enough)  arithmetic  circuits  can  be  accomplished  with  only 
polylogarithmic  overhead.  Namely,  we  present  a  construction  of  fully  homomorphic  encryption  (FHE)  schemes  that  for 
security  parameter  A  can  evaluate  any  width- f2(A)  circuit  with  t  gates  in  time  t  ■  polylog(A). 

To  get  low  overhead,  we  use  the  recent  batch  homomorphic  evaluation  techniques  of  Smart- Vercauteren  and  Brakerski- 
Gentry-Vaikuntanathan,  who  showed  that  homomorphic  operations  can  be  applied  to  “packed”  ciphertexts  that  encrypt 
vectors  of  plaintext  elements.  In  this  work,  we  introduce  permuting/routing  techniques  to  move  plaintext  elements  across 
these  vectors  efficiently.  Hence,  we  are  able  to  implement  general  arithmetic  circuit  in  a  batched  fashion  without  ever 
needing  to  “unpack”  the  plaintext  vectors. 

We  also  introduce  some  other  optimizations  that  can  speed  up  homomorphic  evaluation  in  certain  cases.  For  example,  we 
show  how  to  use  the  Frobenius  map  to  raise  plaintext  elements  to  powers  of  p  at  the  “cost”  of  a  linear  operation. 
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1  Introduction 


Fully  homomorphic  encryption  (FHE)  [16,9,8]  allows  a  worker  to  perform  arbitrarily-complex  dynamically- 
chosen  computations  on  encrypted  data,  despite  not  having  the  secret  decryption  key.  Processing  encrypted  data 
homomorphically  requires  more  computation  than  processing  the  data  unencrypted.  But  how  much  more?  What 
is  the  overhead,  the  ratio  of  encrypted  computation  complexity  to  unencrypted  computation  complexity  (using  a 
circuit  model  of  computation)?  Here,  under  the  ring-LWE  assumption,  we  show  that  the  overhead  can  be  made  as 
low  as  poly  logarithmic  in  the  security  parameter. 

We  accomplish  this  by  packing  many  plaintexts  into  each  ciphertext;  each  ciphertext  has  17(A)  “plaintext  slots”. 
Then,  we  describe  a  complete  set  of  operations  -  Add,  Mult  and  Permute  -  that  allows  us  to  evaluate  arbitrary 
circuits  while  keeping  the  ciphertexts  packed.  Batch  Add  and  Mult  have  been  done  before  [18],  and  follow  easily 
from  the  Chinese  Remainder  Theorem  within  our  underlying  polynomial  ring.  Here  we  introduce  the  operation 
Permute,  that  allows  us  to  homomorphically  move  data  between  the  plaintext  slots,  show  how  to  realize  it  from 
our  underlying  algebra,  and  how  to  use  it  to  evaluate  arbitrary  circuits. 

Our  approach  begins  with  the  observation  [3, 18]  that  we  can  use  an  automorphism  group  Ti  associated  to  our 
underlying  ring  to  “rotate”  or  “re-align”  the  contents  of  the  plaintext  slots.  (These  automorphisms  were  used  in  a 
somewhat  similar  manner  by  Eyubashevsky  et  al.  [15]  in  their  proof  of  the  pseudorandomness  of  REWE.)  While 
Ti  alone  enables  only  a  few  permutations  (e.g.,  “rotations”),  we  show  that  any  permutation  can  be  constructed  as 
a  log-depth  permutation  network,  where  each  level  consists  of  a  constant  number  of  “rotations”,  batch-additions 
and  batch-multiplications.  Our  method  works  when  the  underlying  ring  has  an  associated  automorphism  group  Ti 
which  is  abelian  and  sharply  transitive,  a  condition  that  we  prove  always  holds  for  our  scheme’s  parameters. 

Ultimately,  the  Add,  Mult  and  Permute  operations  can  all  be  accomplished  with  0(A)  computation  by  building 
on  the  recent  Brakerski-Gentry-Vaikuntanathan  (BGV)  “EHE  without  bootstrapping”  scheme  [3],  which  builds  on 
prior  work  by  Brakerski  and  Vaikuntanathan  and  others  [5, 4, 12].  Thus,  we  obtain  an  EHE  scheme  that  can  evaluate 
any  circuit  that  has  17(A)  average  width  with  only  polylog(A)  overhead.  Eor  comparison,  the  smallest  overhead  for 
EHE  was  0(A^'^)  [19]  until  BGV  recently  reduced  it  to  0(A)  [3].^ 

In  addition  to  their  essential  role  in  letting  us  move  data  across  plaintext  slots,  ring  automorphisms  turn  out  to 
have  interesting  secondary  consequences:  they  also  enable  more  nimble  manipulation  of  data  within  plaintext  slots. 
Specifically,  in  some  cases  we  can  use  them  to  raise  the  packed  plaintext  elements  to  a  high  power  with  hardly  any 
increase  in  the  noise  magnitude  of  the  ciphertext!  In  practice,  this  could  permit  evaluation  of  high-degree  circuits 
without  resorting  to  bootstrapping,  in  applications  such  as  computing  AES.  See  Appendix  A.3. 


1.1  Packing  Plaintexts  and  Batched  Homomorphic  Computation 

Smart  and  Vercauteren  [17, 18]  were  the  first  to  observe  that,  by  an  application  the  Chinese  Remainder  Theorem 
to  number  fields,  fhe  plainfexf  space  of  some  previous  EHE  schemes  can  be  parfifioned  info  a  vecfor  of  “plain- 
fexf  slofs”,  and  fhaf  a  single  homomorphic  Add  or  Mult  of  a  pair  of  cipherfexfs  implicifly  adds  or  multiplies 
(componenf-wise)  fhe  entire  plainfexf  vecfors.  Each  plainfexf  slof  is  defined  fo  hold  an  elemenf  in  some  finile 
field  Kn  =  Fpn,  and,  absfraclly,  if  one  has  fwo  cipherfexfs  fhaf  hold  (encrypf)  messages  mo,  ■  ■  ■  ,m£-i  G 
and  ttiq,  . . . ,  G  respectively  in  plainfexf  slofs  0,  ...,£  —  1,  applying  £-Add  fo  fhe  fwo  cipherfexfs  gives 
a  new  cipherfexf  fhaf  holds  mo  +  m'^, . . . ,  m^-i  +  and  applying  £-Mult  gives  a  new  cipherfexf  fhaf  holds 
mo  ■  m-g, . . . ,  me-i  ■  Smarf  and  Vercauferen  used  fhis  observation  for  batch  (or  SIMD  [11])  homomorphic 

^  However,  the  polylog  factors  in  our  new  scheme  are  rather  large.  It  remains  to  be  seen  how  much  of  an  improvement  this  approach  yields 
in  practice,  as  compared  to  the  0(A®  ®)  approach  implemented  in  [10, 19]. 
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operations.  That  is,  they  show  how  to  evaluate  a  funetion  /  homomorphieally  ^  times  in  parallel  on  I  different 
inputs,  with  approximately  the  same  eost  that  it  takes  to  evaluate  the  funetion  onee  without  batehing. 

Here  is  a  taste  of  how  these  separate  plaintext  slots  are  eonstrueted  algebraieally.  As  an  example,  for  the  ring- 
LWE-based  seheme,  suppose  we  use  the  polynomial  ring  A  =  Z[a:]/(a:^  +  1)  where  £  is  a  power  of  2.  Ciphertexts 
are  elements  of  where  (as  in  in  [3])  q  has  only  polylog(A)  bits.  The  “aggregate”  plaintext  spaee  is  Ap  (that 
is,  ring  elements  taken  modulo  p)  for  some  small  prime  p  =  1  mod  2i.  Any  prime  p  =  1  mod  2i  splits  over  the 
field  assoeiated  to  this  ring  -  that  is,  in  A,  the  ideal  generated  by  p  is  the  produet  of  £  ideals  {p*}  eaeh  of  norm 
p  -  and  therefore  Ap  =  Ap^  x  •  •  •  x  Ap^  ^.  Consequently,  using  the  Chinese  remainder  theorem,  we  ean  eneode 
i  independent  mod-p  plaintexts  mo, . . . ,  m£_i  G  {0,...,p  —  1}  as  the  unique  element  in  Ap  that  is  in  all  of  the 
eosets  rrii  +  p*.  Thus,  in  a  single  eiphertext,  we  may  have  £  independent  plaintext  “slots”. 

In  this  work,  we  often  use  ("-Add  and  AMult  to  effieiently  implement  a  Select  operation:  Given  an  index  set  I 
we  ean  eonstruet  a  veetor  v/  of  “seleet  bits”  (no, . . .  ,ve-i),  sueh  that  Uj  =  1  if  i  G  I  and  Vi  =  0  otherwise. 
Then  element-wise  multiplieation  of  a  paeked  eiphertext  c  with  the  seleet  veetor  v  results  in  a  new  eiphertext  that 
eontains  only  the  plaintext  element  in  the  slots  eorresponding  to  I,  and  zero  elsewhere.  Moreover,  by  generating 
two  eomplementing  seleet  veetors  v/  and  vj  we  ean  mix-and-mateh  the  slots  from  two  paeked  eiphertexts  ci  and 
C2:  Setting  c  =  (v/  x  ci)  +  (vj  x  C2),  we  paek  into  c  the  slots  from  ci  at  indexes  from  I  and  the  slots  from  C2 
elsewhere. 

While  batehing  is  useful  in  many  setting,  it  does  not,  by  itself,  yield  low-overhead  homomorphie  eomputation 
in  general,  as  it  does  not  help  us  to  reduee  the  overhead  of  eomputing  a  eomplieated  funetion  just  onee.  Just  as  in 
normal  program  exeeution  of  SIMD  instruetions  (e.g.,  the  SSE  instruetions  on  x86),  one  needs  a  method  of  moving 
data  between  slots  in  eaeh  SIMD  word. 

1.2  Permuting  Plaintexts  Within  the  Plaintext  Slots 

To  reduee  the  overhead  of  homomorphie  eomputation  in  general,  we  need  a  complete  set  of  operations  over  packed 
vectors  of  plaintexts.  The  approaeh  above  allows  us  to  add  or  multiply  messages  that  are  in  the  same  plaintext  slot, 
but  what  if  we  want  to  add  the  eontent  of  the  f-th  slot  in  one  eiphertext  to  the  eontent  of  the  y-th  slot  of  another 
eiphertext,  for  i  /  j?  We  ean  “unpaek”  the  slots  into  separate  eiphertexts  (say,  using  homomorphie  deeryption^  [8, 
9]),  but  there  is  little  hope  that  this  approaeh  eould  yield  very  effieient  EHE.  Instead,  we  eomplement  f'-Add  and 
£-Mult  with  an  operation  A  Permute  to  move  data  effieiently  aeross  slots  within  a  a  given  eiphertext,  and  effieient 
proeedures  to  elone  slots  from  a  paeked  eiphertext  and  move  them  around  to  other  paeked  eiphertexts. 

Brakerski,  Gentry,  and  Vaikuntanathan  [3]  observed  that  for  eertain  parameter  settings,  one  ean  use  automor¬ 
phisms  assoeiated  with  the  algebraie  ring  A  to  “rotate”  all  of  plaintext  spaees  simultaneously,  sort  of  like  turning 
a  dial  on  a  safe.  That  is,  one  can  transform  a  ciphertext  that  holds  mo,  mi, . . . ,  m^-i  in  its  £  slots  into  another 
ciphertext  that  holds  m^,  mj+i, . . . ,  mj_|_£_i  (for  an  arbitrary  given  i,  index  arithmetic  mod  £),  and  this  rotation 
operation  takes  time  quasi-linear  in  the  ciphertext  size,  which  is  quasi-linear  in  the  security  parameter.  They  used 
this  tool  to  construct  Pack  and  Unpack  algorithms  whereby  separate  ciphertexts  could  be  aggregated  (packed)  into 
a  single  ciphertext  with  packed  plaintexts  before  applying  bootstrapping  (and  then  the  refreshed  ciphertext  would 
be  unpacked),  thereby  lowering  the  amortized  cost  of  bootstrapping. 

We  exploit  these  automorphisms  more  fully,  using  the  basic  rotations  that  the  automorphisms  give  us  to  con¬ 
struct  permutation  networks  that  can  permute  data  in  the  plaintext  slots  arbitrarily.  We  also  extend  the  application 
of  the  automorphisms  to  more  general  underlying  rings,  beyond  the  specific  paramefer  seffings  considered  in  prior 
work  [5,4, 3].  This  lefs  us  devise  low-overhead  homomorphic  schemes  for  arifhmefic  circuifs  over  essentially  any 
small  finife  field  F^n . 

This  is  the  approach  suggested  in  [18]  for  Gentry’s  original  FHE  scheme. 
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Our  efficient  implementation  of  Permute,  described  in  Section  3,  uses  the  BenesAVaksman  permutation  net¬ 
work  [2,20].  This  network  consists  of  two  back-to-back  butterfly  network  of  width  2^,  where  each  level  in  the 
network  has  2^“^  “switch  gates”  and  each  switch  gate  swaps  (or  not)  its  two  inputs,  depending  on  a  control  bit. 
It  is  possible  to  realize  any  permutation  of  £  =  2^  items  by  appropriately  setting  the  control  bits  of  all  the  switch 
gates.  Viewing  this  network  as  acting  on  /c-bit  addresses,  the  i-th  level  of  the  network  partitions  the  2^  addresses 
into  2^“^  pairs,  where  each  pair  of  addresses  differs  only  in  the  \i  —  k\-th  bit,  and  then  it  swaps  (or  not)  those  pairs. 
The  fact  that  the  pairs  in  the  i-th  level  always  consist  of  addresses  that  differ  by  exactly  makes  it  easy  to 

implement  each  level  using  rotations:  All  we  need  is  one  rotation  by  2l*“^l  and  another  by  — 2l*“*^l,  followed  by 
two  batched  Select  operations. 

For  general  rings  A,  the  automorphisms  do  not  always  exactly  “rotate”  the  plaintext  slots.  Instead,  they  act  on 
the  slots  in  a  way  that  depends  on  a  quotient  group  Ti  of  the  appropriate  Galois  group.  Nonetheless,  we  use  basic 
theorems  from  Galois  theory,  in  conjunction  with  appropriate  generalizations  of  the  BenesAVaksman  procedure, 
to  construct  a  permutation  network  of  depth  O(logt')  that  can  realize  any  permutation  over  the  I  plaintext  slots, 
where  each  level  of  the  network  consists  of  a  constant  number  of  permutations  from  Ti  and  Select  operations.  As 
with  the  rotations  considered  in  [3],  applying  permutations  from  Ti  can  be  done  in  time  quasi-linear  in  ciphertext 
size,  which  is  only  quasi-linear  in  the  security  parameter.  Overall,  we  find  that  permutation  networks  and  Galois 
theory  are  a  surprisingly  fruitful  combination. 

We  note  that  Damgard,  Ishai  and  Krpigaard  [7]  used  permutation  networks  in  a  somewhat  analogous  fashion 
to  perform  secure  multiparty  computation  with  packed  secret  shares.  In  their  setting,  which  permits  interaction 
between  the  parties,  the  permutations  can  be  evaluated  using  much  simpler  mathematical  machinery. 


1.3  FHE  with  Poly  log  Overhead 

In  our  discussion  above,  we  glossed  over  the  fact  that  ciphertext  sizes  in  a  BGV-like  cryptosystem  [3]  depend 
polynomially  on  the  depth  of  the  circuit  being  evaluated,  because  the  modulus  size  must  grow  with  the  depth  of  the 
circuit  (unless  bootstrapping  [8, 9]  is  used).  So,  without  bootstrapping,  the  “polylog  overhead”  result  only  applies 
to  circuits  of  poly  log  depth.  However,  decryption  itself  can  be  accomplished  in  log-depth  [3],  and  moreover  the 
parameters  can  be  set  so  that  a  ciphertext  with  t7(A)  slots  can  be  decrypted  using  a  circuit  of  size  0(A).  Therefore, 
“recryption”  can  be  accomplished  with  polylog  overhead,  and  we  obtain  FHE  with  polylog  overhead  for  arbitrary 
(wide  enough)  circuits. 


2  Computing  on  (Encrypted)  Arrays 

As  we  explained  above,  our  main  tool  for  low-overhead  homomorphic  computation  is  to  compute  on  “packed 
ciphertexts”,  namely  make  each  ciphertext  hold  a  vector  of  plaintext  values  rather  than  a  single  value.  Throughout 
this  section  we  let  £  be  a  parameter  specifying  the  number  of  plaintext  values  that  are  packed  inside  each  ciphertext, 
namely  we  always  work  with  vectors  of  plaintext  values.  Let  ]K„  =  Fpn  denote  the  plaintext  space  (e.g.,  Kn  =  F2 
if  we  are  dealing  with  binary  circuits  directly).  It  was  shown  in  [3, 18]  how  to  homomorphically  evaluate  batch 
addition  and  multiplication  operations  on  £- vectors: 

AAdd(  (no, . .  ■,ue-i) ,  {vq,  . .  .,ve-i) )  (no  +  uq,  . .  +  ve-i) 

AMult(  (no, . . . ,  ue-i) ,  (no, . . . ,  ve-i) )  (no  x  no, . . . ,  ue-i  x  n£_i) 
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on  packed  ciphertexts  in  time  0{{i  +  A)(log  |IK„|)  where  A  is  the  security  parameter  (with  addition  and  multipli¬ 
cation  in  Kn).^  Specifically,  if  the  size  of  our  plaintext  space  is  polynomially  bounded  and  we  set  i  =  0(A),  then 
we  can  evaluate  the  above  operations  homomorphically  in  time  0(A). 

Unfortunately,  component-wise  £-Add  and  £-Mult  are  not  sufficient  to  perform  arbitrary  computations  on  en¬ 
crypted  arrays,  since  data  at  different  indexes  within  the  arrays  can  never  interact.  To  get  a  complete  set  of  opera¬ 
tions  for  arrays,  we  introduce  the  ('-Permute  operation  that  can  arbitrarily  permute  the  data  within  the  (-element 
arrays.  Namely,  for  any  permutation  vr  over  the  indexes  =  {0, 1, . . . ,(  —  1},  we  want  to  homomorphically 
evaluate  the  function 

(-Permute^  ( (no,...,  ur-i))  =  (u^(o),  •  •  • ,  ^^7r(t-i))  • 

on  a  packed  ciphertext,  with  complexity  similar  to  the  above.  We  will  show  how  to  implement  (-Permute  homo¬ 
morphically  in  Sections  3  and  4  below.  For  now,  we  just  assume  that  such  an  implementation  is  available  and  show 
how  to  use  it  to  obtain  low-overhead  implementation  of  general  circuits. 

2,1  Computing  with  (-Fold  Gates 

We  are  interested  in  computing  arbitrary  functions  using  “(-fold  gates”  that  operate  on  (-element  arrays  as  above. 
We  assume  that  the  function  /(•)  to  be  computed  is  specified  using  a  fan-in-2  arifhmefic  circuif  wifh  t  “normal” 
arifhmefic  gales  (lhal  operale  on  singletons).  Our  goal  is  to  implemenl  /  using  as  few  (-fold  gales  as  possible, 
hopefully  nol  much  more  lhan  f/(  of  Ihem. 

We  assume  lhal  Ihe  inpul  to  /  is  presented  in  a  packed  form,  namely  when  computing  an  r- variate  function 
f{xi, . . . ,  Xr)  we  gel  as  inpul  [r/(]  arrays  (indexed  Aq,  . . . ,  wilh  Ihe  jTh  array  conlaining  Ihe  inpul  ele- 

menls  Xjg,  Ihrough  The  Iasi  array  may  conlain  less  lhan  (  elemenls,  and  Ihe  unused  enlries  conlain  “donT 

care”  elemenls.  In  fad,  Ihroughoul  Ihe  compulation  we  allow  all  of  Ihe  arrays  to  conlain  “donT  care”  enlries. 
We  say  lhal  an  array  is  sparse  if  il  conlains  (/2  or  more  “donT  care”  enlries.  We  mainlain  Ihe  invarianl  lhal  our 
collection  of  arrays  is  always  al  leasl  half  full,  i.e.,  we  hold  r  values  using  al  mosl  [2r/(]  (-elemenl  arrays. 

The  gates  lhal  we  use  in  Ihe  compulation  are  Ihe  (-Add,  (-Mult,  and  (-Permute  gates  from  above.  The  rest  of 
this  section  is  devoted  to  establishing  the  following  theorem: 

Theorem  1,  Let  (,  t,  w  and  W  be  parameters.  Then  any  t-gate  fan-in-2  arithmetic  circuit  C  with  average  width  w 
and  maximum  width  W,  can  be  evaluated  using  a  network  of  0{\t/€\  ■  [(/m]  •  log  W  ■  polylog(())  i-fold  gates 
of  types  (-Add,  (-Mult,  and  (-Permute.  The  depth  of  this  network  of  (.-fold  gates  is  at  most  0(log  W)  times  that  of 
the  original  circuit  C,  and  the  description  of  the  network  can  be  computed  in  time  0{f)  given  the  description  of  C. 

Before  turning  to  proving  Theorem  1,  we  point  out  that  Theorem  1  implies  that  if  the  original  circuit  C  has 
size  t  =  poly(A),  depth  L,  and  average  width  w  =  f2(A),  and  if  we  set  the  packing  parameter  as  (  =  0(A),  then 
we  get  an  0(L  •  log  A)-depth  implementation  of  C  using  0{t/\  ■  polylog(A))  (-fold  gates.  If  implementing  each 
(-fold  gate  takes  0{LX)  time,  then  the  total  time  to  evaluate  C  is  no  more  than 

0(^polylog(A)  •  L  •  A  •  polylog(A))  =  0{t  ■  L  ■  polylog(A)). 

Therefore,  with  this  choice  of  parameter  (and  for  “wide  enough”  circuits  of  average  width  i7(A)),  our  overhead 
for  evaluating  depth-L  circuits  is  only  0{L  •  polylog(A)).  And  if  L  is  also  polylogarithmic,  as  in  BGV  with 
bootstrapping  [3],  then  the  total  overhead  is  polylogarithmic  in  the  security  parameter. 

^  To  compute  L  levels  of  such  operations,  the  complexity  expression  becomes  0((£  -|-  A)(L  log  |]K„|)). 
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The  high-level  idea  of  the  proof  of  Theorem  1  is  what  one  would  expeet.  Consider  an  arbitrary  fan-in  two 
arithmetie  eireuit  C.  Suppose  that  we  have  w  output  wire  values  of  level  i  —  1  paeked  into  roughly  w/l  arrays. 
We  need  to  route  these  output  values  to  their  eorreet  input  positions  at  level  i.  It  should  be  obvious  that  the 
Permute  gates  faeilitate  this  routing,  exeept  for  two  eomplieations: 

1.  The  mapping  from  outputs  of  level  i  —  1  to  inputs  of  level  i  is  not  a  permutation.  Speeifieally,  level-(i  —  1) 
gates  may  have  high  fan-out,  and  so  some  of  the  output  values  may  need  to  be  cloned. 

2.  Onee  the  output  values  are  eloned  suffieiently  (for  a  total  of,  say,  w’  values),  routing  to  level  i  apparently  ealls 
for  a  big  permutation  over  w’  elements,  not  just  a  small  permutation  within  arrays  of  I  elements. 

Below  we  show  that  these  eomplieations  ean  be  handled  effieiently. 


2.2  Permutations  over  Hyper- Rectangles 

First,  eonsider  the  seeond  eomplieation  from  above  -  namely,  that  we  need  to  perform  a  permutation  over  some 
w  elements  (possibly  w  £)  using  f'-Add,  £-Mult,  and  ^-Permute  operations  that  only  work  on  ^-element  arrays. 
We  use  the  following  basie  faet  (ef.  [14]),  for  eompleteness  we  provide  a  proof  in  Appendix  B. 

Lemma  1.  Let  5  =  {0, . . . ,  a  —  1}  x  {0, b  —  1}  be  a  set  ofab  positions,  arranged  as  a  matrix  of  a  rows  and 
b  columns.  For  any  permutation  n  over  S,  there  are  permutations  tts  such  that  tt  =  tts  o  7r2  o  tti  ( that  is,  vr 

is  the  composition  of  the  three  permutations)  and  such  that  tti  and  tts  only  permute  positions  within  each  column 
(these  permutations  only  change  the  row,  not  the  column,  of  each  element)  and  ti2  only  permutes  positions  within 
each  row.  Moreover,  there  is  a  polynomial-time  algorithm  that  given  vr  outputs  the  decomposition  permutations 

7ri,vr2,7r3. 

In  our  eontext.  Lemma  1  says  that  if  we  have  w  elements  paeked  into  k  =  \w/£]  ^-element  arrays,  we  ean  express 
any  permutation  tt  of  these  elements  as  vr  =  vra  o  7r2  o  tti  where  7r2  invokes  £- Permute  (k  times  in  parallel)  to 
permute  data  within  the  respeetive  arrays,  and  tti  ,  vra  only  permute  (£  times  in  parallel)  elements  that  share  the 
same  index  within  their  respeetive  arrays.  In  Seetion  2.3,  we  deseribe  how  to  implement  tti,  vra  using  ^-Add  and 
^-Mult,  and  analyze  the  overall  effieieney  of  implementing  vr.  The  following  generalization  of  Lemma  1  to  higher 
dimensions  will  be  used  later  in  this  work.  It  is  proved  by  invoking  Lemma  1  reeursively. 

Lemma  2.  Let  S  =  x  •  •  •  x  1^.  where  L^^  =  {0, . . . ,  n*  —  1}.  (Each  element  in  S  has  k  coordinates.)  For 
any  permutation  vr  over  S,  there  are  permutations  tti,  . . . ,  7r2fc_i  such  that  vr  =  7r2fc-i  o  •  •  •  o  tti  and  such  that  vr* 
affects  only  the  i-th  coordinate  for  i  <  k  and  only  the  (2k  —  i)-th  coordinate  for  i  >  k. 


2.3  Batch  Selections,  Swaps,  and  Permutation  Networks 

We  now  deseribe  how  to  use  AAdd  and  f'-Mult  to  realize  the  outer  permutations  vri,  7ra,  whieh  permute  {£  times  in 
parallel)  elements  that  share  the  same  index  within  their  respeetive  arrays.  To  perform  these  permutations,  we  ean 
apply  a  permutation  network  a  la  BenesAVaksman  [2, 20].  Reeall  that  a  r-dimensional  Benes  network  eonsists  of 
two  baek-to-baek  butterfly  networks.  Namely  it  is  a  (2r  —  l)-level  network  with  2’’  nodes  in  eaeh  level,  where  for 
i  =  1, 2, . . . ,  2r  —  1,  we  have  an  edge  eonneeting  node  j  in  level  i  —  1  to  node  f  in  level  i  if  the  indexes  j,  j'  are 
either  equal  (a  “straight  edge”)  or  they  differ  in  only  in  the  \r  —  i|’th  bit  (a  “eross  edge”).  The  following  lemma  is 
an  easy  eorollary  of  Lemma  2. 
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Lemma  3.  [13,  Thm  3.11]  Given  any  one-to-one  mapping  vr  o/2^  inputs  to  2’’  outputs  in  an  r -dimensional  Benes 
network  (one  input  per  level-0  node  and  one  output  per  level-{2r  —  1)  node),  there  is  a  set  of  node-disjoint  paths 
from  the  inputs  to  the  outputs  connecting  input  i  to  output  7r{i)for  all  i. 

In  our  setting,  to  implement  our  tti  and  tts  from  Lemma  1  we  need  to  evaluate  j  of  these  permutation  networks 
in  parallel,  one  for  eaeh  index  in  our  £-fold  arrays.  Assume  for  simplicity  that  the  number  of  £-fold  arrays  is  a 
power  of  two,  say  2'’,  and  denote  these  arrays  by  Aq,  . . . ,  yl2^-i,  we  would  have  a  (2r  —  l)-level  network,  where 
the  i’th  level  in  the  network  consists  of  operating  on  pairs  of  arrays  (Aj,  Aj>),  such  that  the  indexes  j,j'  differ  only 
in  the  |r  —  z|’th  bit. 

The  operation  applied  to  two  such  arrays  Aj ,  Aji  works  separately  on  the  different  indexes  of  these  arrays.  For 
each  k  =  0,1,  —  1  the  operation  will  either  swap  Aj[k]  ^  Aj/[A:]  or  will  leave  these  two  entries  unchanged, 

depending  on  whether  the  paths  in  the  A:’th  permutation  network  uses  the  cross  edges  or  the  straight  edges  between 
nodes  j  and  j'  in  levels  i  —  1 ,  i  of  the  permutation  network. 

Thus,  evaluating  i  such  permutation  networks  in  parallel  reduces  to  the  following  Select  function:  Given  two 
arrays  A  =  [mo, . . . ,  m£_i]  and  A!  =  [itiq,  . . . ,  and  a  string  S  =  sq  •  •  •  £  {0, 1}^,  the  operation 

Selects'(A,  A')  outputs  an  array  A"  =  [mg, . . . ,  m”_f\  where,  for  each  k,  m'l.  =  m^  if  =  1  and  m'^  =  m'^ 
otherwise.  It  is  easy  to  implement  Selects’(A,  A!)  using  just  the  £-Add  and  f'-Mult  operations  -  in  particular 

Select5(^,A')  =  AAdd  (  £-Mu\t{A,S),  AMult(A',5)  ) 

where  S  is  the  bitwise  complement  of  S.  Note  that  Select5(A,  A')  outputs  precisely  the  elements  that  are  discarded 
by  Select5(A,  A').  So,  Selects'(yl,  A')  and  Select5(A,  A')  are  exactly  like  the  arrays  A'  and  A' ,  except  that  some 
pairs  of  elements  with  identical  indexes  have  been  swapped  -  namely,  those  pairs  at  index  k  where  Sk  =  0.  Hence 
we  obtain  the  following,  again  the  proof  is  deferred  to  Appendix  B. 

Lemma  4.  Evaluating  £  permutation  networks  in  parallel,  each  permuting  k  items,  can  be  accomplished  using 
0{k  •  log  k)  gates  of  £-Md  and  £-Mult,  and  depth  0(log  k).  Also,  evaluating  a  permutation  tt  over  k  ■  £  elements 
that  are  packed  into  k  £-element  arrays,  can  be  accomplished  using  k  ^-Permute  gates  and  0{k\og  k)  gates  of 
^-Add  and  f'-Mult,  in  depth  0(log  k).  Moreover,  there  is  an  efficient  algorithm  that  given  n  computes  the  circuit  of 
^-Permute,  AAdd,  and  ^-Mult  gates  that  evaluates  it,  specifically  we  can  do  it  in  time  0{k  ■  £  ■  log(k  ■  £)). 

2.4  Cloning:  Handling  High  Fan-out  in  the  Circuit 

We  have  described  how  to  efficiently  realize  a  permutation  over  w  >  £  items  using  ^-Add,  AMult  and  APermute 
gates  that  operate  on  ^-element  arrays.  However,  the  wiring  between  adjacent  levels  of  a  fan-in-two  circuit  are 
typically  not  permutations,  since  we  typically  have  gates  with  high  fan-out.  We  therefore  need  to  clone  the  output 
values  of  these  high-fan-out  gates  before  performing  a  permutation  that  maps  them  to  their  input  positions  at  the 
next  level.  We  describe  an  efficient  procedure  for  this  “cloning”  step. 

A  cloning  procedure.  The  input  to  the  cloning  procedure  consists  of  a  collection  of  k  arrays,  each  with  £  slots, 
where  each  slot  is  either  “full”  (i.e.,  contains  a  value  that  we  want  to  use)  or  “empty”  (i.e.,  contains  a  don’t-care 
value).  We  assume  that  initially  more  than  k-£j2of  the  available  slots  are  full,  and  will  maintain  a  similar  invariant 
throughout  the  procedure.  Denote  the  number  of  full  slots  in  the  input  arrays  by  w  (with  k  ■  £(2  <  w  <  k  ■  £),  and 
denote  the  i’th  input  value  by  Vi.  The  ordering  of  input  values  is  arbitrary  -  e.g.,  we  concatenate  all  the  arrays  and 
order  input  values  by  their  index  in  the  concatenated  multi-array. 

We  are  also  given  a  set  of  positive  integers  mi, . . . ,  m^  >  1,  such  that  vi  should  be  duplicated  mi  times,  V2 
should  be  duplicated  m2  times,  etc.  We  say  that  m*  is  the  intended  multiplicity  of  Vi.  The  total  number  of  full  slots 
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in  the  output  arrays  will  therefore  be  w'  mi  +  m2  +  •  •  •  +  m^  >  w.  In  more  detail,  the  output  of  the  eloning 
proeedure  must  eonsist  of  some  number  k'  of  f'-slot  arrays,  where  k'i /2  <  w'  <  k'i,  sueh  that  vi  appears  in  at 
least  mi  of  the  output  slots,  V2  appears  in  at  least  m2  of  the  output  slots,  ete. 

Denote  the  largest  intended  multiplieity  of  any  value  by  M  =  maxj{mj}.  The  eloning  proeedure  works  in 
[log  M]  phases,  sueh  that  after  the  j’th  phase  eaeh  value  Vi  is  duplieated  min(mj,  2^)  times.  Eaeh  phase  eonsists 
of  making  a  eopy  of  all  the  arrays,  then  for  values  that  oeeur  too  many  times  marking  the  exeess  slots  as  empty 
(i.e.,  marking  the  extra  oeeurrenees  as  don’t-eare  values),  and  finally  merging  arrays  that  are  “sparse”  until  the 
remaining  arrays  are  at  least  half  full.  A  simple  way  to  merge  two  sparse  arrays  is  to  permute  them  so  that  the  full 
slots  appear  in  the  left  half  in  one  array  and  the  right  half  in  the  other,  and  then  apply  Select  in  the  obvious  way. 
A  pseudo-eode  deseription  of  this  proeedure  is  given  in  Figure  1,  whilst  the  proof  of  the  following  lemma  is  in 
Appendix  B. 


Input:  k  €-slot  arrays,  Ai, . . . ,  A^,  each  of  the  k  ■  £  slots  containing  either  a  value  or  the  special  symbol  ‘_L’, 
w  positive  integers  mi , . . . ,  m™  >  1,  where  w  is  the  number  of  full  slots  in  the  input  arrays. 

Output:  k'  £-slot  arrays,  A'l, . . . ,  A'y,  with  each  slot  containing  either  a  value  or  the  special  symbol  ‘_L’, 

where  k' /2  <  rni)ll  <  k'  and  each  input  value  Vi  is  replicated  rrii  times  in  the  output  arrays 

0.  Set  M  ^  maxifmi} 

1.  For)  =  1  to  [logM]  //The ji’th phase 

2.  Make  another  copy  of  all  the  arrays  //  Duplicate  everything 

3.  While  there  are  values  Vi  with  multiplicity  more  than  m^: 

4.  Replace  the  excess  occurrences  of  Vi  by  _L  //  Remove  redundant  entries 

5.  While  there  exist  pairs  of  arrays  that  have  between  them  £  or  more  slots  with  _L: 

6.  Pick  one  such  pair  and  merge  the  two  arrays  //Merge  sparse  arrays 

7.  Output  the  remaining  arrays 


Fig.  1.  The  cloning  procedure 


Lemma  5,  (i)  The  cloning  procedure  from  Figure  1  is  correct. 

(ii)  Assuming  that  at  least  half  the  slots  in  the  input  arrays  are  full,  this  procedure  can  be  implemented  by  a  network 
ofO{w'fi  ■  log(zp'))  (.-fold  gates  of  type  AAdd,  AMult  and  APermute,  where  w'  is  the  total  number  of  full  slots 
in  the  output,  w'  =  depth  of  the  network  is  bounded  by  0(log  w'). 

(Hi)  This  network  can  be  constructed  in  time  0{w'),  given  the  input  arrays  and  the  mfs. 

We  also  describe  some  more  optimizations  in  Appendix  A,  including  a  different  cloning  procedure  that  im¬ 
proves  on  the  complexity  bound  in  Lemma  5.  Putting  all  the  above  together  we  can  efficiently  evaluate  a  circuit 
using  f'-Permute,  AAdd  and  £-Mult,  yielding  a  proof  of  Theorem  1,  see  Appendix  B. 

3  Permutation  Networks  from  Abelian  Group  Actions 

As  we  will  show  in  Section  4,  the  algebra  underlying  our  FHE  scheme  makes  it  possible  to  perform  inexpensive 
operations  on  packed  ciphertexts,  that  have  the  effect  of  permuting  the  (  plaintext  slots  inside  this  packed  cipher- 
text.  However,  not  every  permutation  can  be  realized  this  way;  the  algebra  only  gives  us  a  small  set  of  “simple” 
permutations.  For  example,  in  some  cases,  the  given  automorphisms  “rotate”  the  plaintext  slots,  transforming  a 
ciphertext  that  encrypts  the  vector  (no, ,  n£_i)  into  one  that  encrypts  (n^, . . . ,  n£_i,  no, . . . ,  nfc_i),  for  any  value 
of  k  of  our  choosing.  (See  Section  3.2  for  the  general  case.) 
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Our  goal  in  this  section  is  therefore  to  efficiently  implement  an  f'-PermuteTr  operation  for  an  arbitrary  permuta¬ 
tion  TT  using  only  the  simple  permutations  that  the  algebra  gives  us  (and  also  the  £-Add  and  £-Mult  operations  that 
we  have  available).  We  begin  in  Section  3.1  by  showing  how  to  efficiently  realize  arbitrary  permutations  when  the 
small  set  of  “simple  permutations”  is  the  set  of  rotations.  In  Section  3.2  we  generalize  this  construction  to  a  more 
general  set  of  simple  permutations. 


3.1  Permutation  Networks  from  Cyclic  Rotations  and  Swaps 

Consider  the  Benes  permutation  network  discussed  in  Lemma  3.  It  has  the  interesting  property  that  when  the  2^ 
items  being  permuted  are  labeled  with  r-bit  strings,  then  the  i-th  level  only  swaps  (or  not)  pairs  whose  index  differs 
in  the  |r  —  i|-th  bit.  In  other  words,  the  i-th  level  swaps  only  disjoint  pairs  that  have  offset  from  each  other. 
We  call  this  operation  an  “offset-swap”,  since  all  pairs  of  elements  that  might  be  swapped  have  the  same  mutual 
offset. 

Definition  1  (Offset  Swap).  Let  =  {0, —  1}.  We  say  that  a  permutation  tt  over  is  an  i-ojfset  swap 
if  it  consists  only  of  1 -cycles  and  2-cycles  (i.e.,  tt  =  7t~^),  and  moreover  all  the  2-cycles  in  tt  are  of  the  form 
{k,k  +  i  mod  £)for  different  values  k  G  I^. 

Offset  swaps  modulo  I  are  easy  to  implement  by  combining  two  rotations  with  the  Select  operation  defined  in 
Section  2.3.  Specifically,  for  an  i-offset  swap,  we  need  rotations  by  i  and  —i  mod  I  and  two  Select  operations.  By 
Lemma  3,  a  Benes  network  can  realize  any  permutation  over  2*'  elements  using  2r  —  1  levels  where  the  i-th  level 
is  a  2l^“*l -offset  swap  modulo  2'’.  An  i-offset  modulo  2'",  I  <  2''  <  21  can  be  cobbled  together  using  a  constant 
number  of  offset  swaps  modulo  £  and  Select  operations,  with  offsets  i  and  21  —  i.  Therefore,  given  a  cyclic  group 
of  “simple”  permutations  LL  and  Select  operations,  we  can  implement  any  permutation  using  a  Benes  network  with 
low  overhead.  Specifically,  we  prove  the  following  lemma  in  Appendix  B. 

Lemma  6.  Fix  an  integer  i  and  let  k  =  [log  €\ .  Any  permutation  tt  over  Ii  =  {0,  —  1}  can  be  implemented 

by  a  {2k  —  l)-level  network,  with  each  level  consisting  of  a  constant  number  of  rotations  and  Select  operations  on 
i-arrays. 

Moreover,  regardless  of  the  permutation  tt,  the  rotations  that  are  used  in  level  i  (i  =  1, . . . ,  2A:  —  1)  are  always 
exactly  2l^“*l  and  i  —  2l^“*l  positions,  and  the  network  depends  on  tt  only  via  the  bits  that  control  the  Select 
operations.  Finally,  this  network  can  be  constructed  in  time  0{i)  given  the  description  o/tt. 


3.2  Generalizing  to  Sharply-Transitive  Abelian  Groups 

Below,  we  extend  our  techniques  above  to  deal  with  a  more  general  set  of  “simple  permutations”  that  we  get  from 
our  ring  automorphisms.  (See  Sections  4  and  C.3.) 

Definition  2  (Sharply  Transitive  Permutation  Groups).  Denote  the  Felement  symmetric  group  by  Se  (i.e.,  the 
group  of  all  permutations  over  =  {0,...,£  —  1}),  and  let  Ft  be  a  subgroup  of  S^.  The  subgroup  Ft  is  sharply 
transitive  if  for  every  two  indexes  i,  j  G  Ig  there  exists  a  unique  permutation  h  £  FI  such  that  h{i)  =  j. 

Of  course,  the  group  of  rotations  is  an  example  of  an  abelian  and  sharply  transitive  permutation  group.  It  is 
abelian:  rotating  by  ki  positions  and  then  by  k2  positions  is  the  same  as  rotating  by  k2  positions  and  then  by  ki 
positions.  It  is  also  sharply  transitive:  for  all  i,j  there  is  a  single  rotation  amount  that  maps  index  i  to  index  j, 
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namely  rotation  by  j  —  i.  However,  rotations  are  eertainly  not  the  only  example.  We  now  explain  how  to  effieiently 
realize  arbitrary  permutations  using  as  building  bloeks  the  permutations  from  any  sharply-transitive  abelian  group. 

Reeall  that  any  abelian  group  is  isomorphie  to  a  direet  produet  of  eyelie  groups,  henee  Tt  =  x  •  •  •  x  Cn^ 
(where  is  a  eyelie  group  with  elements  for  some  integers  ii  >  2  where  ii  divides  £i+i  for  all  i).  As  any 
eyelie  group  with  ii  elements  is  isomorphie  to  =  {0, 1, . . . ,  4  —  1}  with  the  operation  of  addition  mod  ii,  we 
will  identify  elements  in  Tl  with  veetors  in  the  box  B  =  x  •  •  •  x  ,  where  eomposing  two  group  elements 
eorresponds  to  adding  their  assoeiated  veetors  (modulo  the  box).  The  group  Ti  is  generated  by  the  k  unit  veetors 
{er}r=i  (where  =  (0, . . . ,  0, 1, 0, . . . ,  0)  with  1  in  the  r-th  position).  We  stress  that  our  group  7{  has  polynomial 
size,  so  we  ean  effieiently  eompute  the  representation  of  elements  in  7{  as  veetors  in  B. 

Sinee  His  a  sharply  transitive  group  of  permutations  over  the  indexes  /£  =  {0,...,£  —  1},  we  ean  similarly 
label  the  indexes  in  Ig  by  veetors  in  B:  Piek  an  arbitrary  index  io  G  le,  then  for  all  /i  G  label  the  index  /i(fo)  G  Ii 
with  the  veetor  assoeiated  with  h.  This  proeedure  labels  every  element  in  H  with  exaetly  one  veetor  from  B,  sinee 
for  every  i  ^  Ii  there  is  a  unique  h  £  H  sueh  that  /i(io)  =  i-  Also,  sinee  H  =  B,  we  use  all  the  veetors  in  B 
for  this  labeling  {\H\  =  \B\  =  i).  Note  that  with  this  labeling,  applying  the  generator  to  an  index  labeled  with 
veetor  v  ^  B,  yields  an  index  labeled  with  v'  =  v  +  Sr  mod  B.  Namely  we  inerement  by  one  the  r’th  entry  in  v 
(mod  ir),  leaving  the  other  entries  unehanged. 

In  other  words,  rather  than  a  one-dimensional  array,  we  view  H  as  a  /c-dimensional  matrix  (by  identifying  it 
with  B).  The  aetion  of  the  generator  Br  on  this  matrix  is  to  rotate  it  by  one  along  the  r-th  dimension,  and  similarly 
applying  the  permutation  ej?  G  to  this  matrix  rotates  it  by  k  positions  along  the  r-th  dimension.  For  example, 
when  /c  =  2,  we  view  Ii  as  an  ii  x  £2  matrix,  and  the  group  H  ineludes  permutations  of  the  form  e\  that  rotate 
all  the  eolumns  of  this  matrix  by  k  positions  and  also  permutations  of  the  form  that  rotate  all  the  rows  of  this 
matrix  by  k  positions. 

Using  Lemma  6,  we  ean  now  implement  arbitrary  permutations  along  the  r’th  dimension  using  a  permutation 
network  built  from  offset-swaps  along  the  r’th  dimension.  Moreover,  sinee  the  offset  amounts  used  in  the  network 
do  not  depend  on  the  speeifie  permutation  that  we  want  to  implement,  we  ean  use  just  one  sueh  network  to  im¬ 
plement  in  parallel  different  arbitrary  permutations  on  different  r’th-dimension  sub-matriees.  For  example,  in  the 
2-dimensional  ease,  we  ean  effeet  a  different  permutation  on  every  eolumn,  yet  realize  all  these  different  permuta¬ 
tions  using  just  one  network  of  rotations  and  Selects,  by  using  the  same  offset  amounts  but  different  Select  bits  for 
the  different  eolumns.  More  generally  we  ean  realize  arbitrary  (different)  ijH  permutations  along  all  the  different 
“generalized  columns”  in  dimension-r,  using  a  network  of  depth  Oiiogir)  consisting  of  permutations  h  ^H  and 
i-iolA  Select  operations  (and  we  can  construct  that  network  in  time  ij i^  ■  0{ir)  =  0{i)). 

Once  we  are  able  to  realize  different  arbitrary  permutations  along  the  different  “generalized  columns”  in  all 
the  dimensions,  we  can  apply  Lemma  2.  That  lemma  allows  us  to  decompose  any  permutation  tt  on  H  into  2k  —  1 
permutations  tt  =  o  •  •  •  o  7r2fc_i  where  each  vTj  consists  only  of  permuting  the  generalized  columns  in  dimension 
r  =  \k  —  i\.  Hence  we  can  realize  an  arbitrary  permutation  on  H  as  a  network  of  permutations  h  ^  H  and 
i-fo\d  Select  operations,  of  total  depth  bounded  by  2  0(\ogii)  =  Oilogi)  (the  last  bound  follows  since 
£  =  ntcJ  construct  that  network  in  time  bounded  by  2  0{£i)  =  0{£)  (the  bound  follows 

since  k  <  logf').  Concluding  this  discussion,  we  have: 

Lemma  7,  Fix  any  integer  £  and  any  abelian  sharply-transitive  group  of  permutations  over  H,  H  C  Si.  Then  for 
every  permutation  vr  G  Si,  there  is  a  permutation  network  of  depth  0(\og£)  that  realizes  tt,  where  each  level  of 
the  network  consists  of  a  constant  number  of  permutations  from  H  and  Select  operations  on  £-arrays. 

Moreover,  the  permutations  used  in  each  level  do  not  depend  on  the  particular  permutation  vr,  the  network 
depends  on  tt  only  via  the  bits  that  control  the  Select  operations.  Finally,  this  network  can  be  constructed  in  time 
0(£)  given  the  description  of  tt  and  the  labeling  of  elements  in  H,  Has  vectors  in  B.  □ 
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Lemma  7  tells  us  that  we  ean  implement  an  arbitrary  Permute  operation  using  a  log-depth  network  of  per¬ 
mutations  /i  G  77  (in  eonjunetion  with  7- Add  and  7- Mu  It).  Plugging  this  into  Theorem  1  we  therefore  obtain: 

Theorem  2.  Let  7,  f,  w  and  W  be  parameters,  and  let  77  be  an  abelian,  sharply-transitive  group  of  permutations 
over 

Then  any  t-gate  fan-in-2  arithmetic  circuit  C  with  average  width  w  and  maximum  width  W,  can  be  evaluated 
using  a  network  of  0[\t/ f\  ■  \(-/w\  ToglL-polylog(7))  i-f old  gates  of  types  i-kdd,  7-Mult,  and  h  G  77.  The  depth 
of  this  network  ofi-fold  gates  is  at  most  0(log  W  ■  log  7)  times  that  of  the  original  circuit  C,  and  the  description 
of  the  network  can  be  computed  in  time  0{t  ■  log  7)  given  the  description  of  C.  □ 

4  FHE  With  Polylog  Overhead 

Theorem  2  implies  that  if  we  eould  effieiently  realize  7-Add,  7-Mult,  and  77-aetions  on  paeked  eiphertexts  (where 
77  is  a  sharply  transitive  abelian  group  of  permutations  on  7-slot  arrays),  then  we  ean  evaluate  arbitrary  (wide 
enough)  eireuits  with  low  overhead.  Speeifieally,  if  we  eould  set  7  =  0(A)  and  realize  7-Add,  7-Mult,  and  77- 
aetions  in  time  0(A),  then  we  ean  realize  any  eireuit  of  average  width  f2(A)  with  just  polylog(A)  overhead.  It 
remains  only  to  deseribe  an  FHE  system  that  has  the  required  eomplexity  for  these  basie  homomorphie  operations. 

4.1  The  Basic  Setting  of  FHE  Schemes  Based  on  Ideal  Lattices  and  Ring  LWE 

Many  of  the  known  FHE  sehemes  work  over  a  polynomial  ring  A  =  IfX]/ F{X),  where  F{X)  is  irredueible 
monie  polynomial,  typieally  a  eyelotomie  polynomial.  Ciphertexts  are  typieally  veetors  (eonsisting  of  one  or  two 
elements)  over  Ag  =  A/qA  where  q  is  an  integer  modulus,  and  the  plaintext  spaee  of  the  seheme  is  Ap  =  A/pA 
for  some  integer  modulus  p  q  with  gcd(p,  q)  =  1,  for  example  p  =  2.  (Namely,  the  plaintext  is  represented 
as  an  integer  polynomial  with  eoeffieients  mod  p.)  Seeret  keys  are  also  veetors  over  Aq,  and  deeryption  works  by 
taking  the  inner  produet  b  ^  (c,  s)  in  Ag  (so  b  is  an  integer  polynomial  with  eoeffieients  in  (— q/2,  q/2])  then 
reeovering  the  message  as  b  mod  p.  Namely,  the  deeryption  formula  is  [[(c,  s)  mod  F{X)]g]p  where  [-Jg  denotes 
modular  reduetion  into  the  range  {—q/2,  q/2].  Below  we  eonsider  eiphertext  veetors  and  seeret-key  veetors  with 
two  entries,  sinee  this  is  indeed  the  ease  for  the  variant  of  the  BGV  seheme  [3]  that  we  use. 

Smart  and  Vereauteren  [18]  observed  that  the  underlying  ring  strueture  of  these  sehemes  makes  it  possible  to 
realize  homomorphie  (bateh)  Add  and  Mult  operations,  i.e.  our  7-Add  and  7-Mult.  Speeifieally,  fhough  F{X)  is 
fypieally  irredueible  over  Q,  if  may  nonefheless  faefor  modulo  p;  F(X)  =  Oto  p.  In  fhis  ease,  fhe 

plainfexf  spaee  of  fhe  seheme  also  faefors:  Ap  =  <8)jZQAp^.  where  pj  is  fhe  ideal  in  A  generafed  by  p  and  Fi{X). 
In  parfieular,  fhe  Chinese  Remainder  Theorem  applies,  and  fhe  plainfexf  spaee  is  parfifioned  info  7  independenf 
non-inferaefing  “plainfexf  slofs”,  whieh  is  preeisely  whaf  we  need  for  eomponenf-wise  7-Add  and  7-Mult.  The 
deeryption  formula  reeovers  the  “aggregate  plaintext”  a  <—  [[(c,  s)  mod  F{X)]q\p,  and  this  aggregate  plaintext  is 
deeoded  to  get  the  individual  plaintext  elements,  roughly  via  Zj  <—  a  mod  {Fi{x),p)  G  Ap^. 

4.2  Implementing  Group  Actions  on  FHE  Plaintext  Slots 

While  eomponent-wise  Add  and  Mult  are  straightforward,  getting  different  plaintext  slots  to  internet  is  more 
ehallenging.  For  ease  of  exposition,  suppose  at  first  that  F{X)  is  the  degree-(m  —  1)  polynomial  Fm{X)  = 
{X^  —  1)/  {X  —  1)  for  m  prime,  and  that  p  =  1  (mod  m).  Thus  our  ring  A  above  is  the  mth  eyelotomie  number 
field.  In  this  ease  F{X)  faetors  to  linear  terms  modulo  p,  F{X)  =  “  Pi)  (™od  p)  with  pi  G  ¥p.  Henee 
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we  obtain  ^  =  m  —  \  plaintext  slots,  eaeh  slot  holding  an  element  of  the  finite  field  Fp  (i.e.  in  fhis  ease  Ap-  above 
is  equal  fo  Fp). 

To  gel  <Prn  to  faelor  modulo  p  info  linear  lerms  we  musl  have  p  =  I  (mod  m),  so  p  >  m.  Also  we  need 
m  =  17(A)  lo  gel  seeurily  (sinee  m  is  roughly  Ihe  dimension  of  Ihe  underlying  lalliee).  This  means  lhal  lo  gel 
lo  faelor  into  linear  lerms  we  musl  use  plainlexl  spaees  lhal  are  somewhal  large  (in  parlieular  we  eannol  direelly 
use  F2).  Later  in  Ihis  seelion  we  skeleh  Ihe  more  elaborate  algebra  needed  to  handle  Ihe  general  (and  praelieal) 
ease  of  non-prime  m  and  p  m,  where  <l>rn  may  nol  faelor  into  linear  terms.  This  is  eovered  in  more  delail  in 
Appendix  C.  For  now,  however,  we  eoneenlrale  on  Ihe  simple  ease  where  <!>rn  faelors  into  linear  terms  modulo  p. 

Reeall  lhal  eipherlexls  are  veelors  over  Zq[7f]/^m(7f),  so  eaeh  enlry  in  Ihese  veelors  eorresponds  to  an  integer 
polynomial.  Consider  now  whal  happens  if  we  simply  replaee  X  wilh  X*  inside  all  Ihese  polynomials,  for  some 
exponenl  i  G  Z^,  i  >  1.  Namely,  for  eaeh  polynomial  f{X),  we  eonsider  f^^\X)  =  /(X*)  mod  <Prn{X).  Noliee 
lhal  if  we  were  using  polynomial  arilhmelie  modulo  X"^  —  1  (ralher  Ihen  modulo  (Prn{X))  Ihen  Ihis  Iransformalion 
would  jusl  permutes  Ihe  eoeffieienls  of  Ihe  polynomials.  Namely  /(*)  has  Ihe  same  eoeffieienls  as  /  bul  in  a  differenl 
order,  whieh  means  lhal  if  Ihe  eoeffieienl  veelor  of  /  has  small  norm  Ihen  Ihe  same  holds  for  Ihe  eoeffieienl  veelor 
of  /(*).  In  Appendix  D  we  show  lhal  using  a  differenl  nolion  of  “size”  of  a  polynomial  (namely,  Ihe  norm  of  Ihe 
eanonieal  embedding  of  a  polynomial  ralher  lhan  Ihe  norm  of  ils  eoeffieienl  veelor),  we  ean  eonelude  Ihe  same 
also  for  mod-(Prn  polynomial  arilhmelie.  Namely,  Ihe  mapping  /(X)  1-^  /(X*)  mod  (Prn{X)  does  nol  ehange  Ihe 
“size”  of  Ihe  polynomial.  To  simplify  presenlalion,  below  we  deseribe  everylhing  in  terms  of  eoeffieienl  veelors 
and  arilhmelie  modulo  X'”  —  1.  The  aelual  mod-^m  implemenlalion  lhal  we  use  is  deseribed  in  Appendix  D  (and 
a  slighlly  differenl  implemenlalion  is  deseribed  in  Appendix  E). 

Lei  us  now  eonsider  Ihe  effeel  of  Ihe  Iransformalion  X  1-^  X*  on  deeryplion.  Lei  c  =  (co(X),  ci(X))  and  s  = 
(so(7f),  si(X))  be  eipherlexl  and  seerel-key  veelors,  and  lei  b  =  (c,  s)  mod  (X™^— 1,  q)  and  a  =  b  mod  p.  Denote 
cW  =  (co(X*),  ci(X*))  mod  (X™'  —  1),  and  define  and  similarly.  Sinee  (c,  s)  =  6  (mod  X™'  —  1,  q), 

we  have  lhal 

co(X)so(7f)  +  ci(X)si(X)  =  6(X)  +  q-r(X)  +  (X”^-l)s(X)  (overZ[X]) 
for  some  integer  polynomials  ^(X),  s(X),  and  Iherefore  also 

co(X*)so(7f*)  +  ci(X*)si(X*)  =  6(X*)  +  g  •  r(X*)  +  (X™  -  l)s(X*)  (overZ[X]). 

Sinee  X™  —  1  divides  X"^*  —  1,  Ihen  we  also  have 

+  q  •  r(X*)  +  (X"^  -  l)S(X)  (overZ[X]) 

for  some  r(X),  S{X).  Thai  is,  6^  =  mod  (X™'  —  1,  q).  Clearly,  we  also  have  =  6^*^  (mod  p). 

This  means  lhal  if  c  deerypls  to  Ihe  aggregate  plainlexl  a  under  s,  Ihen  deerypls  to  under  ! 

The  eryplosyslem  from  [3, 4]  have  a  meehanism  for  “key  swilehing”  (whieh  is  also  applieable  to  Ihe  seheme 
from  [5]),  Iransforming  a  eipherlexl  c  lhal  deerypls  to  a  under  s  to  a  new  eipherlexl  c'  lhal  deerypls  to  Ihe  same  a 
under  some  olher  seerel  key  s'.  Using  Ihe  same  meehanism,  we  ean  Iranslale  Ihe  Iransformed  eipherlexl  into 
one  lhal  deerypls  to  under  anolher  s'  of  our  ehoiee.  We  ean  even  Iranslale  il  baek  to  a  eipherlexl  deeryplable 
under  Ihe  original  s  is  we  are  willing  to  assume  eireular  seeurily.  Using  Ihe  BGV  eryplosyslem  [5,4,3]  wilh 
appropriate  parameters,  key  swilehing  ean  be  aeeomplished  in  lime  0(A).  (See  Appendiees  D  and  E  for  delails  on 
our  varianls  of  Ihe  BGV  seheme  [5].) 

Bul  how  does  Ihis  new  aggregate  plainlexl  relate  to  Ihe  original  a?  Here  we  apply  to  Galois  Iheory,  whieh 
tells  us  lhal  deeoding  Ihe  aggregate  (whieh  we  do  roughly  by  selling  Zj  <—  mod  {Fj,p)),  Ihe  sel  of  zj’s 
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that  we  get  is  exaetly  the  same  as  when  deeoding  the  original  aggregate  a,  albeit  in  different  order.  Roughly,  this  is 
because  each  of  our  plaintext  slots  corresponds  to  a  root  of  the  polynomial  F{X),  and  the  transformations  X 
X*,  which  are  precisely  the  elements  of  the  Galois  group,  permute  these  roots.  In  other  words  by  transforming 
c  ^  (followed  by  key  switching),  we  can  permute  the  plaintext  slots  inside  the  packed  ciphertext.  Moreover, 
in  our  simplified  case,  the  permutations  have  a  single  cycle  -  i.e.,  they  are  rotations  of  the  slots.  Arranging  the  slots 
appropriately  we  can  get  that  the  transformation  c  ^  rotates  the  slots  by  exactly  i  positions,  thus  we  get  the 
group  of  rotations  that  we  were  using  in  Section  3.1.  In  general  the  situation  is  a  little  more  complicated,  but  the 
above  intuition  still  can  be  made  to  hold;  for  more  details  see  Appendix  C. 

The  general  case.  In  the  general  case,  when  m  is  not  a  prime,  the  polynomial  (Prn{X)  has  degree  4>{m)  (where  (/>(•) 
is  Euler’s  totient  function),  and  it  factors  mod  p  into  a  number  of  same-degree  irreducible  factors.  Specifically,  fhe 
degree  of  fhe  factors  is  fhe  smallesf  integer  d  such  fhaf  p'^  =  1  (mod  m),  and  fhe  number  of  faclors  is  £  =  (j){m)  / d 
(which  is  of  course  an  integer),  =  0^=0 it  means  fhaf  we  have  ^  plainfexf  slofs,  each 

isomorphic  to  fhe  finile  field  F^d,  and  an  aggregafe  plainfexf  is  a  degree-((/)(m)  —  1)  polynomial  over  Fp. 

Suppose  fhaf  we  wanf  fo  evaluafe  homomorphically  a  circuif  over  some  underlying  field  Kn  =  Fpn,  then  we 
need  to  find  an  integer  m  such  fhaf  (Prn{X)  facfors  mod  p  info  degree-d  factors,  where  d  is  divisible  by  n.  This  way 
we  could  direcfly  embed  elemenfs  of  fhe  underlying  plainfexf  space  ]K„  inside  our  plainfexf  slofs  fhaf  hold  elemenfs 
of  Fpd,  and  addition  and  mulfiplicafion  of  plainfexf  slofs  will  direcfly  correspond  fo  addifions  and  mulfiplicafions 
of  elemenfs  in  ]K„.  (This  follows  since  ]K„  =  Fpn  is  a  subfield  of  F^d  when  n  divides  d.) 

Nofe  fhaf  each  plainfexf  slof  will  only  have  n  logp  bifs  of  relevanf  informalion,  i.e.,  fhe  underlying  elemenf  of 
Fpn,  buf  if  lakes  dlogp  bils  fo  specify.  We  Ihus  gel  an  “embedding  overhead”  factor  of  d/n  even  before  we  encrypl 
anylhing.  We  Iherefore  need  fo  choose  our  parameler  m  so  as  fo  keep  Ibis  overhead  fo  a  minimum. 

Even  for  a  non-prime  m,  fhe  Galois  group  (/al(Q[X]/^m(-^))  consisls  of  all  fhe  Iransformalions  X  X* 
for  i  G  hence  Ihere  are  exaclly  (/)(m)  of  Ihem.  As  in  fhe  simplified  case  above,  if  we  have  a  cipherlexl  c 
fhaf  decrypls  fo  an  aggregafe  plainfexf  a  under  s,  Ihen  decrypls  fo  under  Differenlly  from  fhe  simple 
case,  however,  nol  all  members  of  fhe  Galois  group  induce  permufalions  on  fhe  plainfexf  slofs,  i.e.,  decoding  fhe 
aggregate  plainfexf  does  nol  necessarily  give  us  fhe  same  sel  of  (permuted)  plainfexf  elemenfs  as  decoding 
fhe  original  a.  Instead  ^al(Q[X]/^m(-^))  conlains  a  subgroup  Q  =  {(X  X^^)  :  j  =  0,  l,...,d  —  1} 

corresponding  fo  fhe  Erobenius  aulomorphisms^  modulo  p.  This  subgroup  does  nol  permule  fhe  slofs  al  all,  buf  fhe 
quolienl  group  Ti  =  Qsi\/Q  does.  Clearly,  Q  has  order  d  and  Ti  has  order  (j){m)/d  =  i.  In  Appendix  C  we  show 
fhaf  fhe  quolienl  group  Tt  acls  as  a  Iransilive  permulalion  group  on  our  £  plainfexf  slofs,  and  since  if  has  order  £ 
Ihen  if  musl  be  sharply  Iransilive.  In  fhe  general  case  we  Iherefore  use  Ibis  group  Tt  as  our  permulalion  group  for 
fhe  purpose  of  Eemma  7.  Anolher  complication  is  fhaf  fhe  aufomorphism  fhaf  we  can  compule  are  elemenfs  of  ^al 
and  nol  elemenfs  in  fhe  quolienl  group  Tt.  In  Appendix  C  we  also  show  how  to  emulafe  fhe  permufalions  in  Tt,  via 
use  of  cosel  represenlalives  in  ^al. 


4.3  Parameter  Setting  for  Low- Overhead  FHE 

Given  the  background  from  above  (and  the  modification  of  the  BGV  cryptosystem  [5]  in  Appendices  D  or  E),  we 
explain  how  to  set  the  parameters  for  our  variant  of  the  BGV  scheme  so  as  to  get  low-overhead  EHE  scheme.  Below 
we  first  show  how  to  evaluate  depth-L  circuits  with  average-width  f7(A)  with  overhead  of  only  0(L)-polylog(A), 
and  then  use  bootstrapping  to  get  overhead  of  polylog(A)  regardless  of  depth. 


Plaintext-Space  Terminology  and  Notations  The  discussion  below  refers  to  three  different  “plaintext  spaces”: 

®  The  group  G  is  called  the  decomposition  group  at  p  in  the  literature. 
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-  The  “underlying  plaintext  space”:  The  eireuit  that  we  want  to  evaluate  homomorphieally  is  an  arithmetic 
circuit  over  some  (finite)  ring,  and  that  finite  ring  is  the  “underlying  plaintext  space”.  We  typically  think  of  the 
underlying  plaintext  space  as  being  just  F2,  but  it  is  sometimes  convenient  to  use  other  spaces  (e.g.,  F28  when 
computing  AES,  or  perhaps  ¥p  for  some  32-bit  prime  p  in  other  applications). 

In  this  work  we  always  assume  that  the  underlying  plaintext  space  is  small,  either  of  constant  size  or  at  most  of 
size  polynomial  in  A.  Moreover,  we  assume  that  it  is  a  field,  namely  =  F^n  for  some  prime  p  and  integer 
n  >  1. 

-  The  “embedded plaintext  space”.  This  is  whaf  is  held  in  each  of  our  plainfexf  slofs.  For  example,  we  could 
have  underlying  space  F2,  buf  embed  our  bifs  in  elemenfs  of  ¥p  for  some  larger  integer  p,  or  maybe  in  elemenfs 
of  F2d  for  some  d  >  1.  (In  fhe  former  case  we  need  fo  emulafe  binary  XOR  using  a  degree-2  polynomial  mod  p, 
in  fhe  latter  case  mulfiplicafion  and  addifion  work  as  expecfed.) 

-  The  “aggregate plaintext  space”.  This  is  fhe  plainfexf  space  fhaf  is  nafively  encrypted  in  fhe  crypfosysfem:  An 
elemenf  in  fhe  aggregate  plainfexf  space  is  a  polynomial  in  some  Fp[A],  and  as  explained  above  if  encodes  (via 
CRT)  an  ^-vecfor  over  fhe  embedded  plainfexf  space. 

When  choosing  paramefers  for  our  FHE  consfrucfion,  we  are  given  fhe  depfh  and  widfh  of  fhe  circuifs  fhaf  we 
need  fo  evaluafe  homomorphieally,  as  well  as  fhe  underlying  plainfexf  space  and  fhe  securify  paramefer.  We  fhen 
wanf  fo  choose  fhe  “embedded”  and  “aggregafe”  plainfexf  spaces  and  all  fhe  ofher  paramefers  so  as  fo  minimize 
fhe  overhead.  Namely,  minimize  fhe  rafio  befween  fhe  number  of  gales  in  fhe  underlying  circuifs  and  fhe  lime  fhaf 
if  lakes  fo  evaluate  Ihem  homomorphieally.  We  describe  Iwo  melhods  for  choosing  fhe  parameters:  One  is  likely  fo 
be  more  efficienf  in  practice,  buf  we  can  only  prove  fhaf  if  yields  low  overhead  for  eilher  small  underlying  plainfexf 
spaces  (of  size  polylog(A))  or  very  wide  circuifs  (of  widfh  f7(A  •  p”)).  The  ofher  (simpler)  melhod  can  be  shown  fo 
work  for  any  poly-size  underlying  plainfexf  space  and  circuifs  of  widfh  f2(A),  buf  is  almost  certain  to  yield  worst 
performance  in  practice. 

In  either  approach,  we  begin  by  lower-bounding  the  dimension  of  the  lattice  that  we  need  (in  order  to  get 
security),  thus  getting  a  lower-bound  on  our  parameter  m  (recall  that  we  will  eventually  get  a  dimension-(/)(m) 
lattice).  Once  we  have  this  lower-bound  M,  we  either  pick  m  =  —  1  >  M  for  some  integer  s,  or  just  choose  m 

as  p'  —  1  for  some  prime  number  p'  sufficiently  larger  than  M.  In  the  former  case  we  have  “embedded  plaintext 
space”  Fpns  into  which  we  can  directly  embed  the  underlying  space  ¥pn,  and  in  the  latter  case  we  need  to  emulate 
Fpn  arithmetic  using  polynomials  over  Fp/. 

Once  we  set  the  parameter  m  and  get  the  corresponding  “embedded  plaintext  space”,  we  can  easily  compute 
the  packing  parameter  £  and  all  the  other  parameters. 


Step  1.  Lower-Bounding  the  Dimension  Suppose  that  we  want  to  evaluate  homomorphieally  circuits  of  depth  L 
over  some  small  finite  field  Fpn,  wifh  average  depfh  w  and  maximum  depfh  W  =  poly  (A),  where  A  is  fhe  securify 
paramefer.  Clearly,  for  securify  paramefer  A  we  need  cipherfexfs  of  size  af  leasf  i7(A),  so  we  cannof  hope  fo 
evaluafe  any  homomorphic  operafion  fasfer  fhan  0(A).  To  gef  low  overhead,  we  Iherefore  musf  be  able  fo  pack 
af  leasf  £  =  f7(A)  plainfexf  slofs  (from  our  “embedded”  space)  info  one  cipherfexf.  This  means  fhaf  we  only  gel 
low-overhead  implemenfafion  when  fhe  widfh  of  fhe  underlying  circuifs  is  af  leasf  f7(A). 

From  Theorem  2  we  know  fhaf  for  any  packing  paramefer  £  we  can  evaluafe  deplh-L  circuifs  using  a  nelwork 
of  £-fo\A  gates  of  depfh  L'  =  0{L  •  log  IF  •  logf').  (If  we  use  fhe  second  approach  below  for  choosing  fhe 
parameter  m  fhen  we  need  anofher  addifive  term  of  L  •  log(p”)  =  0{L  ■  log  A)  fo  emulate  Fpn  arifhmelic  using 
mod-m  polynomials.)  We  will  show  below  fhaf  if  is  sufficienl  fo  choose  eilher  £  =  0(A)  or£  =  0{p^  ■  A)  <poly (A) 
(depending  on  which  of  fhe  Iwo  approaches  we  use),  buf  in  eilher  case  we  have  L'  <  c  -  L  ■  log  W  ■  log  A  for  some 
conslanl  c  fhaf  we  can  compute  from  fhe  given  paramefers. 
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Recall  that  the  BGV  cryptosystem  needs  V  different  moduli  qi  when  evaluating  a  depth-L'  network.  When 
implementing  arithmetic  operations  over  a  characteristic-p  field  and  working  with  dimension-M  lattices,  the  largest 
modulus  needs  to  be  go  =  (M  (for  some  constant  d  <  2)  to  get  the  homomorphic  evaluation  functionality, 

and  M  >  A  •  log  go  to  got  security.  Plugging  in  all  these  constraints,  we  get  a  lower-bound  on  the  dimension  of  the 
lattice  M  >  c”  •  L  •  X  log  A  •  log  W  •  log  p  for  some  constant  d'  that  we  can  compute  from  the  given  parameters 
(note  that  M  =  0{L  ■  A)). 


Step  2.  Choosing  the  parameter  m  Below  we  will  choose  our  parameter  m  so  as  to  get  i;f)(m)  >  M.  We  use  the 
following  lemma,  whose  proof  is  in  Appendix  B. 

Lemma  8.  For  all  positive  integers  m  we  have  m/dirn)  =  0(log  log  m). 

We  will  then  choose  our  parameter  m  larger  than  c*M  for  some  c*  =  0(log  log  M),  to  ensure  that  ^{m)  >  M. 

Approach  1:  Using  Extension  Fields.  Setting  s  =  [logp„  (c*M  + 1)] ,  we  see  that  the  integer  m  =  p^^  —  1  satisfies 
all  our  requiremenfs.  On  one  hand  if  is  large  enough,  m  >  c*M  by  consfrucfion.  On  fhe  ofher  hand  for  d  =  n  •  s 
we  clearly  have  fhaf  p'^  =  1  (mod  m),  which  is  whaf  we  need  in  order  fo  use  fhe  “embedded  plainfexf  space”  ¥pd 
wifh  fhe  “aggregafe  plainfexf  space”  ¥p[X]/(Pm{X). 

Moreover,  fhe  “embedding  overhead”  d/n  =  s  is  small:  since  M  =  0{L  •  A)  and  s  <  log2(c*M  +  1)  fhen 
clearly  s  =  0(log(L  •  A)).  Thus  fhe  number  of  bifs  fhaf  if  fakes  fo  specify  an  “aggregafe  plainfexf”  is  only  a  factor 
of  0(log(L  •  A))  larger  fhan  whaf  you  need  fo  specify  all  fhe  elemenfs  of  fhe  “underlying  plainfexf  space”  fhaf  are 
embedded  in  fhis  aggregafe  plainfexf. 

However,  in  some  cases  fhe  paramefer  m  ifself  (and  fherefore  fhe  laffice  dimension)  could  be  large:  Nofe  fhaf 
we  have  M  =  0{L  ■  A)  and  since  s  =  [logpu  (c*M  +  1)]  fhenp”®  <  {c*M  +  1)  •  p”.  If  fhe  size  of  fhe  underlying 
plainfexf  space  (i.e.,  p^)  is  polylogarifhmic,  fhen  we  have  m  =  0{L  •  A)  which  is  whaf  we  need.  However,  if  fhe 
underlying  plainfexf  size  is  larger,  say  p”  A,  fhen  we  could  have  m  =  0{L  ■  A^).  In  fhis  case  we  can  no  longer 
hope  fo  evaluafe  homomorphic  operafions  in  time  0{L  ■  A)  (since  fhe  cipherfexf  size  is  too  large). 

If  fhe  circuifs  fhaf  we  wanf  fo  evaluafe  are  very  wide  (i.e.,  of  widfh  f7(A  -p”))  fhen  we  can  jusf  pack  sufficienfly 
many  plainfexf  slofs  inside  each  cipherfexf  fo  gef  fhe  overhead  down.  We  can  do  fhis  since  fhe  “embedding  over¬ 
head”  is  logarifhmic.  Buf  for  narrower  circuifs,  say  of  widfh  0(A  -t-p"^),  we  jusf  don’f  have  enough  plainfexf  fo  puf 
in  all  fhese  slofs,  hence  our  overhead  increases. 

We  poinf  ouf  fhaf  we  may  be  able  fo  do  better  fhan  m  =  p"^®  —  1,  for  example  we  can  use  any  m'  such  fhaf 
(^(m')  >  M  and  m'  divides  p"^®  —  1.  Buf  if  is  nof  clear  fhaf  such  m'  <  m  exisfs  (for  example  when  p  =  2 
fhen  p'^®  —  1  could  be  a  prime  number).  If  is  also  permissible  fo  choose  some  s'  >  s  and  fhen  choose  m'  fhaf 
divides  p”®  —  1  wifh  cpifn')  >  M-  As  long  as  s'  <polylog(L  •  A)  fhen  we  still  have  only  a  polylog  “embedding 
overhead”,  and  m'  may  be  much  smaller  fhan  m  =  p'^®  —  1.  Unforfunafely  we  were  nof  able  fo  prove  fhaf  such 
s'  <polylog(L  •  A)  and  m'  <  0{L  ■  A)  always  exisf,  we  consider  fhis  an  inferesfing  open  problem. 

Approach  2:  Using  Prime  Fields.  An  alfernafive,  simpler,  approach  is  fo  jusf  pick  m  =  p'  —  1  for  a  prime  number 
p'  sufficienfly  larger  fhan  M,  (so  as  fo  gef  (p{m)  >  M),  and  sef  our  “embedded  plainfexf  space”  fo  be  Fp/.  This  will 
give  us  fhe  “simple  case”  fhaf  we  discussed  earlier  in  fhis  secfion,  where  facfors  info  linear  ferms  mod  p'.  Nofe 
fhaf  in  fhis  case  we  clearly  have  m  =  0{M),  so  (a)  fhe  “embedding  overhead”  is  af  mosf  0(log  M)  =  0(log(LA)), 
and  (b)  as  long  as  we  work  wifh  circuifs  of  widfh  17(A)  we  can  pack  enough  plainfexf  elemenfs  info  each  cipherfexf 
fo  gef  low  overhead. 
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This  solutions  has  a  few  drawbacks,  however.  One  relatively  minor  drawback  is  that  the  native  operations  of 
the  scheme  are  now  over  a  characteristic-p'  field,  and  ifp’  >  p  then  the  bound  M  on  the  dimension  will  be  slightly 
larger  than  before  (since  the  noise  in  fresh  ciphertexts  is  now  of  the  form  p'  ■  e  rather  that  p  ■  e).  A  more  serious 
problem  is  that  each  gate  of  the  underlying  circuit  must  now  be  emulated  using  a  polynomial  mod  p'.  We  note, 
however,  that  this  only  results  in  a  logarithmic  slowdown:  It  is  not  hard  to  see  that  arithmetic  over  F^n  can  be 
emulated  by  mod-p'  circuits  of  depth  and  size  0(n  •  logp)  (e.g.,  express  these  operations  as  binary  circuits  and 
emulate  that  binary  circuit  mod-p')- 

Once  we  determined  the  parameter  m  and  the  “embedded  plaintext  space”,  all  the  other  parameters  of  the 
scheme  easily  follow,  and  we  obtain  the  following  theorem: 

Theorem  3.  For  security  parameter  A,  any  t-gate,  depth- L  arithmetic  circuit  of  average  width  17(A)  over  under¬ 
lying  plaintext  space  F^n  (with  p""  <poly{X))  can  be  evaluated  homomorphically  in  time  t  ■  0(L)-polylog{X). 

4.4  Achieving  Depth-Independent  Overhead 

Theorem  3  implies  that  we  can  implement  shallow  arithmetic  circuit  with  low  overhead,  but  when  the  circuit 
gets  deeper  the  dependence  of  the  overhead  on  L  causes  the  overhead  to  increase.  Recall  that  the  reason  for  this 
dependence  on  the  depth  is  that  in  the  BGV  cryptosystem  [3],  the  moduli  get  smaller  as  we  go  up  the  circuit,  which 
means  that  for  the  first  layers  of  the  circuit  we  must  choose  moduli  of  bitsize  17(L). 

As  explained  in  [3],  the  dependence  on  the  depth  can  be  circumvented  by  using  bootstrapping.  Namely,  we  can 
start  with  a  modulus  which  is  not  too  large,  then  reduce  it  as  we  go  up  the  circuit,  and  once  the  modulus  become 
too  small  to  do  further  computation  we  can  bootstrap  back  into  the  larger-modulus  ciphertexts,  then  continue  with 
the  computation. 

For  our  purposes,  we  need  to  ensure  that  we  bootstrap  often  enough  to  keep  the  moduli  small,  and  yet  that  the 
time  we  spend  on  bootstrapping  does  not  significantly  impact  the  overhead.  Here  we  apply  to  the  analysis  from 
[3],  that  shows  that  a  packed  ciphertext  with  17(A)  slots  can  be  decrypted  using  a  circuit  of  size  0(A)  and  depth 
polylog(A).  Hence  we  can  even  bootstrap  after  every  layer  of  the  circuit  and  still  keep  the  overhead  polylogarith- 
mic,  and  the  moduli  never  grow  beyond  poly  logarithmic  bitsize.  We  thus  get: 

Theorem  4.  For  security  parameter  A,  any  t-gate  arithmetic  circuit  of  average  width  17(A)  over  underlying  plain¬ 
text  space  Fpn  (with  p^  <poly(X))  can  be  evaluated  homomorphically  in  time  t-polylog(\). 
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A  Additional  Optimizations 
A,1  Faster  Cloning 

In  Lemma  5  we  establish  that  we  can  clone  w'  values  using  £-fold  operations  in  time  0((w'  log  w')/t}.  Below  we 
show  how  to  remove  the  log  w’  term,  which  would  allow  us  to  clone  values  between  levels  in  the  circuit  using 
asymptotically  optimal  0{w'  ji')  time. 

Recall  that  for  the  cloning  procedure  we  are  given  a  “multi-array”  A'  consisting  of  several  ^-element  arrays, 
and  also  the  intended  multiplicities  of  the  values  in  these  arrays  mi, . . . ,  niw  As  before,  denote  the  maximum 
intended  multiplicity  by  M  =  maxj{mj}.  The  new  procedure  consists  of  two  main  parts: 

Decomposition:  For  i  =  0,1...,  M,  construct  a  “multi-array”  A'*  that  contains  the  elements  whose  intended 
multiplicity  is  at  least  2*,  as  follows: 

Set  A'o  =  A'.  Then  for  /  >  0  we  compute  A'*  from  A'j_i  by  marking  the  slots  of  all  the  elements  with 
intended  multiplicity  smaller  than  2*  as  empty,  and  then  merging  sparse  arrays  until  the  multi-array  is  at  least  half¬ 
full  (or  contains  only  one  array).  Note  that  when  computing  A'j  from  A'j_i,  we  also  keep  a  copy  of  A'j_i  for  use 
in  the  aggregation  part  below. 

Aggregation:  For  i  =  M, . . . ,  1,  0,  construct  a  multi-array  Aj  as  follows.  Set  Am  =  ^'m,  then  for  all  i  <  M 
concatenate  two  copies  of  Aj+i  with  one  copy  of  A'*,  and  if  the  result  is  not  half  full  them  merge  sparse  arrays 
until  it  is  half  full  again.  The  result  is  A*. 

Note  since  each  of  Aj+i,  A'*  is  either  half  full  or  contains  a  single  array,  then  at  most  two  merge  operations 
are  needed  in  each  aggregation  step.  The  output  of  the  cloning  procedure  is  Aq. 
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Lemma  9.  The  procedure  above  is  correct,  and  it  uses  only  0(^  +  logw')  copy  and  merge  operations  on  £- 
element  arrays,  where  w'  =  ^  •  m* 

Proof.  Consider  an  arbitrary  element  of  the  input  multi-array  A',  with  intended  multiplieity  nii  G  [2^ ,  —  1]  for 

some  j.  The  deeomposition  part  will  output  multi-arrays  sueh  that  this  element  is  in  eaeh  of  A'o,  •  •  • ,  A'j.  Then, 
during  the  aggregation  part,  Aj  will  inelude  one  eopy  of  this  element,  Aj_i  three  eopies,  Aj_2  seven  eopies, 
and  in  general  Aj_fc  eontains  2^~^^  —  1  eopies.  Henee  at  the  end  of  the  aggregation  part,  Aq  ineludes  2-^+^  —  1 
oeeurrenees  of  this  element  (whieh  is  at  least  as  mueh  as  m*  but  less  than  2mi). 

To  analyze  eomplexity,  notiee  that  the  number  of  arrays  in  every  multi-array  A'j  equals  the  number  of  arrays 
in  A'j_i  minus  the  number  of  merge  operations  that  were  used  when  eomputing  A'j.  Sinee  A'm  cannot  have  less 
than  zero  arrays,  it  follows  that  the  total  number  of  merge  operations  throughout  the  deeomposition  part  eannot 
be  more  than  the  initial  number  of  arrays,  namely  \2w/€\  <  \2w' /€\.  We  observed  above  that  the  aggregation 
part  does  at  most  two  merges  for  eaeh  Aj,  so  the  total  number  of  merges  during  this  part  is  at  most  2  [log  M]  < 
2  [log  ru'] .  Thus  the  total  number  of  merge  operations  is  bounded  by  A  =  \2w' /T\  +  2  [log  M]  =  0(^  +  log  w'). 

Finally,  the  output  multi-array  A'  eontains  at  most  twiee  as  many  oeeurrenees  of  eaeh  element  as  needed,  and 
it  is  at  least  half  full.  Henee  it  eontains  at  most  [  arrays,  whieh  means  that  the  entire  proeedure  duplieated 
arrays  at  most  [ +  N  =  +  log  w')  times.  □ 

The  proeedure  above  ean  be  made  partieularly  effieient  in  our  ease,  when  used  in  eonjunetion  with  the  fol¬ 
lowing  optimization:  When  eonsidering  a  eireuit,  we  sort  the  gates  in  eaeh  level  aeeording  to  their  fan-out,  thus 
making  the  input  to  the  eloning  proeedure  sorted  by  the  intended  multiplieity.  Note  that  the  deeomposition  part 
now  beeomes  unneeessary,  we  just  define  A'j  to  be  the  eolleetion  of  the  first  few  arrays,  all  the  ones  that  eontain 
elements  of  intended  multiplieity  at  least  2f 

Also  important  is  that  onee  the  inputs  are  sorted,  merging  arrays  do  not  need  the  full  power  of  the  Permute 
operation.  As  long  as  we  keep  the  full  slots  in  the  arrays  eontinuous,  we  ean  use  the  simple  rotation  operation 
to  align  the  two  arrays  before  we  merge  them.  (The  same  ean  be  done  with  the  “higher-dimensional  rotations” 
that  we  get  in  the  general  ease  in  Seetion  4.)  Henee  the  entire  eloning  network  ean  be  implemented  using  only 
0(^  +  logru')  basie  operations  of  £-Add,  £-Mult,  and  Tf-aetions. 

A,2  Faster  Routing 

Traeing  through  the  proofs  in  Seetion  2,  in  eonjunetion  with  the  more  effieient  eloning  teehnique  from  above,  one 
ean  verify  that  the  log  W  term  in  the  statement  of  Theorem  1  ean  be  made  to  multiply  only  the  number  of  £-Add 
and  AMult  gates,  not  A  Permute,  whieh  ean  make  a  big  differenee  in  praetiee.  Roughly,  the  log  W  term  arises  from 
the  faet  that  we  seem  to  need  •  log  W)  eomputation  (in  the  worst-ease)  to  route  the  inter-level  wires.  Note  that 
sueh  a  log  W  term  does  not  appear  in  the  overhead  of  non-batehed  FHE  sehemes  that  operate  on  singletons  rather 
than  arrays.  It  seems  plausible  that  this  term  eould  be  eliminated  somehow,  and  we  eonsider  this  an  interesting 
open  problem. 


A.3  Powering  (Almost)  for  Free 

In  some  applieations,  plaintext  elements  are  not  bits  or  integers,  but  rather  elements  in  a  finite  extension  field. 
For  example,  when  implementing  homomorphie  AES,  if  may  be  eonvenienf  fo  use  F28  as  fhe  underlying  plainfexf 
spaee  [12, 18].  In  fhese  eases,  fhe  eorresponding  Galois  group  (whose  automorphisms  we  use  to  permute  fhe  slofs) 
ineludes  also  fhe  Erobenius  automorphism.  (This  is  x  ^  in  fhe  AES  example,  and  more  generally  x  ^ 
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when  using  a  characteristic-^  field.)  We  show  in  Section  4  that  applying  the  Galois  group  transformations  to  packed 
ciphertexts  results  in  almost  no  additional  noise.  Thus  we  get  a  new  function,  f'-Frobenius,  that  raises  the  I  slots  in 
parallel  to  a  power  of  p,  while  adding  almost  no  additional  noise.  This  may  not  be  surprising,  since  the  Frobenius 
map  is  a  linear  operation  on  F^n . 

In  practice  this  turns  out  to  be  a  useful  optimization  for  particular  functions  of  interest:  For  the  case  of  AES, 
the  only  non-linear  part  of  AES  is  inversion  in  F28,  which  is  equivalent  to  exponentiation  to  the  254-th  power. 
While  this  may  seem  to  be  high-degree,  the  Frobenius  automorphism  allows  us  to  evaluate  this  power  relatively 
cheaply  on  I  elements  in  parallel.  Eor  an  a  G  F28  sitting  in  a  plaintext  slot,  we  use  the  Erobenius  map  to  compute 
aj  =  for  j  =  1,  2, . . . ,  7  (these  are  the  ’I’s  in  the  binary  representation  of  254),  then  multiply  all  the  aj  to 
get  =  a~^.  Thus,  we  can  evaluate  at  a  price  of  only  seven  products  (in  terms  of  noise),  and  this  7-fold 
product  can  be  computed  by  a  depth-3  circuit.  The  binary  affine  transformation  of  the  AES  S-box  is  not  linear  over 
F28,  but  it  is  linear  over  the  outputs  of  the  Erobenius  automorphisms,  and  so  it  is  linear  in  terms  of  its  effect  on 
ciphertext  noise  (although  to  extract  and  pack  the  bits  uses  up  two  more  levels  in  the  circuit).  The  ShiftRows  and 
MixColumns  operation  take  four  more  levels  using  our  permutation  networks,  and  the  matrix  multiplication  in  the 
MixColumns  uses  another  level.  An  AES  round  can  therefore  be  accomplished  using  only  a  depth- 10  circuit  (in 
terms  of  noise),  so  homomorphic  implementation  of  the  full  AES- 128  will  take  a  circuit  of  depth  less  than  100.  It 
is  therefore  plausible  that  we  could  implement  AES-128  homomorphically  without  resorting  to  bootstrapping  at 
all!!!  (We  note,  however,  that  many  other  optimizations  are  possible,  and  it  is  not  clear  if  the  approach  sketched 
above  is  really  the  most  efficient  one  for  implementing  AES-128.) 

B  Proofs 


Lemma  1.  Let  5  =  {0, . . . ,  a  —  1}  x  {0, b  —  1}  be  a  set  of  ab  positions,  arranged  as  a  matrix  of  a  rows  and 
b  columns.  For  any  permutation  tt  over  S,  there  are  permutations  tts  such  that  tt  =  tts  o  7r2  o  tti  ( that  is,  vr 

is  the  composition  of  the  three  permutations)  and  such  that  tti  and  only  permute  positions  within  each  column 
(these  permutations  only  change  the  row,  not  the  column,  of  each  element)  and  712  only  permutes  positions  within 
each  row.  Moreover,  there  is  a  polynomial-time  algorithm  that  given  vr  outputs  the  decomposition  permutations 

7ri,vr2,7r3. 

Proof.  The  basic  strategy  of  the  decomposition  is  that  7r2  will  send  each  element  to  some  address  with  the  same 
y-coordinate  as  its  target  destination,  and  similarly  tts  will  correct  all  of  the  x-coordinates.  The  permutation  tti,  on 
the  other  hand,  serves  as  a  strategic  indirection.  The  reason  this  indirection  is  needed  -  i.e.,  the  reason  we  cannot 
decompose  vr  just  as  tts  o  7r2  with  the  properties  above  -  is  that  several  elements  in  the  same  row  could  have  the 
same  target  y-coordinate  (and  thus  1x2  cannot  achieve  its  goal).  Thus,  tti  is  used  to  ensure  that,  when  7r2  receives 
its  input,  no  two  elements  in  the  same  row  have  the  same  target  column.  The  only  nontrivial  part  of  the  proof  is 
showing  that  a  suitable  vri  always  exists. 

Eor  s  G  S',  let  Sx  and  Sy  denote  its  x  and  y  coordinates,  namely  s  =  {sx,Sy).  Consider  a  bipartite  graph 
G  =  (Vi,  V2,  E)  where  Vi  and  V2  each  have  b  vertexes  with  labels  {0, . . . ,  6  —  1}.  Eor  every  s  G  5,  we  draw  an 
edge  from  the  Fi -vertex  labeled  Sy  to  the  V2 -vertex  labeled  7r(s)y,  and  we  label  the  edge  ‘s’.  (We  may  have  more 
than  one  edge  between  the  same  pair  of  vertices’s.)  Clearly,  this  is  a  bipartite,  a-regular  graph.  Therefore  G’s  edges 
can  be  partitioned  into  a  perfect  matches,  and  this  partition  can  be  computed  efficiently  (e.g.,  using  network-flow 
algorithms).  In  other  words,  one  can  compute  in  polynomial  time  a  coloring  of  the  edges  of  G  using  the  colors 
{0, . . . ,  a  —  1},  such  that  for  all  i  the  i-colored  subgraph  Gi  of  G  is  a  perfect  matching. 

Eet  p{s)  denote  the  color  of  the  edge  labeled  ‘s’ .  Now,  define  tti,  7r2,  tts  as  follows:  for  all  s  =  (s^;,  Sy)  G  S: 

7ri(s)  =  {p{s),Sy),  7X2  o  vri(s)  =  (/9(s),  7r(s)y),  7x307x2°  7ri(s)  =  (7r(s)a;,  7r(s)j^) 
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Clearly,  tti  ,  tts  have  the  elaimed  property  of  only  permuting  within  eolumns  and  7r2  only  permutes  within  rows. 
All  that  remains  is  to  establish  that  they  are  all  well-defined  permutations  -  i.e.,  that  no  “eollisions”  oeeur.  vri 
is  a  permutation  beeause  no  two  edges  emanating  from  the  Ci -vertex  labeled  "sy  have  the  same  eolor.  7r2  is  a 
permutation,  in  partieular  it  permutes  elements  in  row  i,  beeause  the  subgraph  Gi  is  a  perfeet  matehing.  Finally, 
TTs  is  a  permutation  sinee  both  1x2  o  tti  and  tt  are  permutations  and  sinee  vr  =  tts  o  7r2  o  tti  .  □ 

Lemma  4.  Evaluating  (.  permutation  networks  in  parallel,  each  permuting  k  items,  can  be  accomplished  using 
0{k  •  log  k)  gates  o/AAdd  and  £-Mult,  and  depth  0(log  k).  Also,  evaluating  a  permutation  tt  over  k  ■  i  elements 
that  are  packed  into  k  i-element  arrays,  can  be  accomplished  using  k  ^-Permute  gates  and  0{k\og  k)  gates  of 
£-Add  and  f-Mult,  in  depth  0(log  k).  Moreover,  there  is  an  efficient  algorithm  that  given  tt  computes  the  circuit  of 
^-Permute,  AAdd,  and  ^-Mult  gates  that  evaluates  it,  specifically  we  can  do  it  in  time  0{k  ■  i  ■  log{k  ■  i)). 

Proof  The  first  statement  follows  direetly  from  Lemma  3  and  the  diseussion  above.  The  seeond  statement  follows 
from  Lemma  1,  whieh  says  that  the  permutation  tt  ean  be  deeomposed  as  tt  =  tts  o  7r2  o  tti  where  tti  and  tts  eaeh 
involve  evaluating  n  permutation  networks  in  parallel  aeross  the  I  indexes,  and  7r2  only  permutes  elements  within 
eaeh  Aelement  array,  and  therefore  ean  be  done  using  k  gates  of  ^-Permute  and  just  one  level. 

The  effieieney  of  eomputing  the  eireuit  that  realizes  tt  follows  from  the  faet  that  the  deeomposition  tti  ,  7r2 ,  tts 
ean  be  eomputed  effieiently,  as  per  Lemma  1.  In  faet,  it  was  shown  by  Lev  et  al.  [14]  that  this  deeomposition  ean 
be  eomputed  in  time  0{k  ■  I  ■  log(A:  •  £)).  □ 

Lemma  5.  (i)  The  cloning  procedure  from  Figure  1  is  correct. 

(ii)  Assuming  that  at  least  half  the  slots  in  the  input  arrays  are  full,  this  procedure  can  be  implemented  by  a  network 
ofO{wfi  ■  log(m'))  i-fold  gates  of  type  f'-Add,  AMult  and  APermute,  where  w'  is  the  total  number  of  full  slots 
in  the  output,  w'  =  rrii.  The  depth  of  the  network  is  bounded  by  0(log  w'). 

(Hi)  This  network  can  be  constructed  in  time  0{w'),  given  the  input  arrays  and  the  mi’s. 

Proof.  In  eaeh  phase  j,  first  the  number  of  oeeurrenees  of  every  value  is  doubled,  and  next  if  a  value  Vi  oeeurs  more 
than  mi  times  then  the  exeess  oeeurrenees  are  removed.  Therefore  after  the  j’th  phase  eaeh  value  Vi  is  duplieated 

def 

min(mi,  2^)  times.  Denoting  the  number  of  full  slots  after  the  j’th  phase  by  Wj  =  'ff^i  min(mi,  2^),  we  have  at 
the  end  of  phase  j  some  number  kj  of  £-slot  arrays,  where  (kj  —  l)i/2  <  Wj  <  kj  ■  £,  sinee  onee  the  merging  part 
is  over  we  must  have  at  least  half  the  slots  full.  Correetness  now  follows  easily  just  by  looking  at  j  =  [log  M] . 

Regarding  eomplexity  (part  (ii)),  we  note  that  if  the  input  arrays  are  at  least  half  full  then  at  the  beginning  of 
every  iteration  we  have  kj-i  <  2wj-i(£  =<  2w' (£  =  0{w' (£)  arrays  (elearly  wj  <  w'  for  all  j  by  definition.) 
After  the  duplieation  step  (Line  2)  we  have  2kj-i  arrays,  and  then  eaeh  merging  step  (Line  6)  removes  one  array, 
so  we  ean  have  at  most  2kj-i  =  0{w' (£)  sueh  steps.  Observing  that  every  merge  takes  a  eonstant  number  of  gates 
(two  APermute  gates  and  one  Select  operation),  we  eonelude  that  eaeh  phase  takes  at  most  0(w'  jP)  ('-fold  gates.’ 
The  number  of  phases  is  [log  M]  <  [log  w'~\ ,  and  the  elaimed  eomplexity  follows. 

Part  (hi)  follows  easily  by  noting  that  the  network  implementing  eaeh  phase  ean  be  eonstrueted  in  time  quasi- 
linear  in  the  number  of  slots  that  are  available  at  the  beginning  of  that  phase,  just  by  using  greedy  algorithms 
to  make  all  the  deeisions.  (The  most  time-eonsuming  operation  is  marking  entries  as  “don’t-eare”s  in  Line  4, 
everything  else  ean  be  done  in  time  0{w' /£).)  □ 

Theorem  1,  Let  £,  t,  w  and  W  be  parameters.  Then  any  t-gate  fan-in-2  arithmetic  circuit  C  with  average  width  w 
and  maximum  width  W,  can  be  evaluated  using  a  network  of  0[\t/£']  ■  \£/w\  ■  log  W  ■  polylog(('))  £-fold  gates 

’  Note  that  removing  redundant  values  (Line  4)  does  not  take  any  gates,  we  leave  the  arrays  unchanged  and  just  mark  the  redundant  values 
as  “don’t-care”s. 
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of  types  £-Add,  £-Mult,  and  ^-Permute.  The  depth  of  this  network  of  (.-fold  gates  is  at  most  0(log  W)  times  that  of 
the  original  circuit  C,  and  the  description  of  the  network  can  be  computed  in  time  0{f)  given  the  description  of  C. 

Proof  Consider  one  level  of  the  eireuit  with  w'  gates,  where  in  the  previous  level  we  eomputed  w  <  2w'  input 
values,  packed  into  0{\w/l'\)  ^-element  arrays.  Our  approach  is  to  first  clone  and  then  permute  these  values  so 
that  the  2w'  input  slots  of  the  w'  gates  are  filled  correcfly.  More  precisely,  fhese  2w'  inpuf  slofs  will  be  arranged  in 
fwo  sefs  of  f'-slol  array,  one  sef  for  fhe  lefl  inpufs  and  fhe  ofher  for  fhe  righf  inpufs  fo  all  fhe  gales.  Concalenaling 
fhese  fwo  sefs  of  arrays  info  fwo  mulli-arrays,  we  arrange  fhe  slofs  such  lhal  fhe  lefl  and  righf  inpufs  lo  each  gale 
are  aligned  in  fhe  same  index  in  fhe  fwo  mulli-arrays.  Once  all  fhe  values  are  rouled  lo  Iheir  correcl  locations  in  fhe 
multi-arrays,  fhe  aclual  compulation  of  fhe  gales  in  Ihis  layer  can  obviously  be  evaluated  only  0{\w' /f])  (-fold 
gates  of  f'-Adds  or  £-Mults. 

By  Lemma  5,  we  can  compute  Ihe  mulli-arrays  of  0{w'  jl)  £-elemenl  arrays  lhal  conlains  Ihe  inpuls  wilh 
suflicienl  mulliplicily  using  0{\w' /€\  ■  log(u;'))  I'-fold  gates.  The  resulting  mulli-arrays  have  0{w)  slols  (more 
lhan  eilher  Ihe  source  or  largel  mulli-arrays),  al  leasl  half  of  which  conlain  “real  values”  while  Ihe  olher  slols 
conlain  “donT-care”s.  Lei  tt  be  a  permulalion  over  Ihese  0{w)  slols  lhal  maps  Ihe  slols  lhal  conlain  Ihe  real 
values  lo  Ihe  appropriate  positions  in  Ihe  largel  mulli-arrays.  By  Lemma  4  we  can  evaluate  tt  wilh  a  nelwork  of 
0{w' / l^o\y\og\w' / )  n-fold  gates,  and  can  compute  Ihe  slruclure  of  lhal  nelwork  in  time  0{w'). 

The  result  for  the  whole  circuit  follows  easily,  using  as  our  inductive  hypothesis  that  the  w'  outputs  are  indeed 
packed  into  0{\w' /C] )  ('-element  arrays  for  input  to  the  next  level.  □ 

Lemma  6.  Fix  an  integer  (  and  let  k  =  [log  .  Any  permutation  vr  over  Ig  =  {0, ...,(  —  1}  can  be  implemented 
by  a  {2k  —  l)-level  network,  with  each  level  consisting  of  a  constant  number  of  rotations  and  Select  operations  on 
(-arrays. 

Moreover,  regardless  of  the  permutation  n,  the  rotations  that  are  used  in  level  i  (i  =  1, . . . ,  2A:  —  1)  are  always 
exactly  and  I  —  positions,  and  the  network  depends  on  vr  only  via  the  bits  that  control  the  Select 

operations.  Finally,  this  network  can  be  constructed  in  time  0{()  given  the  description  of  tt. 

Proof.  If  ( is  a  power  of  two  then  the  network  is  just  a  Benes  network.  Otherwise  (i.e.,  2*^-1  <(<2^  for  some  k) 
the  basic  strategy  is  to  realize  a  permutation  over  by  using  two  A:-element  arrays  to  realize  a  Benes  permutation 
network  over  the  first  2^  of  the  2(  positions.  We  realize  each  level  of  the  Benes  network  using  a  constant  number  of 
rotations  and  Select  operations.  Since  2^  >  ( then  clearly  any  permutation  on  can  be  expressed  as  a  permutation 
over  the  first  2^  positions  (e.g.,  where  the  last  2^  —  (  elements  remain  fixed). 

It  remains  only  to  show  how  to  realize  an  i-offset-swap  over  the  first  2^  elements  using  just  a  constant  number 
of  operations  on  the  two  (-slot  arrays.  Clearly,  we  can  handle  all  the  pairs  {v,  v  -\-  j)  where  both  indexes  are  in  the 
same  array  using  the  rotations  j  and  (— j  and  two  Select  operations,  applied  to  the  each  of  the  arrays.  To  handle  the 
pairs  where  v  is  in  the  first  array  and  n  +  j  is  in  the  second  (at  index  v  +  j  —  (),  we  shift  the  first  array  by  (  —  j  and 
the  second  array  by  j,  then  again  use  two  Select  operations  (one  Select  on  the  first  array  and  the  shifted  version  of 
the  second,  the  other  Select  on  the  second  array  and  the  shifted  version  of  the  first).  All  in  all  we  have  four  rotation 
operations  (two  for  each  array)  and  six  Select’s.  The  “Finally”  part  follows  directly  from  Lemma  3.  □ 

Lemma  8.  For  all  positive  integers  m  we  have  m/f{m)  =  0(log  log  m). 

Proof.  The  “worst-case”  that  maximizes  m/4>{m)  is  when  m  is  a  product  of  distinct  primes  m  =  pi  ■  ■  ■  pt,  in 
which  case  we  have  m/4>{m)  =  p\l{p\  —  1)  •  •  -pt/ipt  —  1)-  Clearly,  the  worst-case  is  when  the  p/s  are  the^zr^t 
t  primes.  In  this  case,  we  can  use  the  prime  number  theorem  to  argue  that  pt  =  polylog(m)  (actually,  something 
like  logm).  By  Merten’s  theorem  the  product  over  primes  np<poiyiogm(’/(f^  —  1)  is  0(loglogm). 
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C  Basic  Algebra 


To  understand  our  techniques  it  is  first  necessary  to  recap  on  the  underlying  algebra  of  cyclotomic  fields.  We  have 
tried  to  cover  as  much  detail  as  needed,  but  the  reader  should  be  aware  a  self  contained  treatment  will  be  hard  to 
come  by  in  such  a  short  space.  We  therefore  refer  the  interested  reader  to  [21]  for  details  on  cyclotomic  fields. 

C.l  Reductions  of  Cyclotomic  Fields 

We  let  (Prn{X)  be  the  m-th  cyclotomic  polynomial,  and  let  K  =  Q{Cm)  denote  the  associated  number  field.  The 
degree  of  is  4>{m),  where  (/>(•)  is  Euler’s  phi-function.  Note  that  asymptotically  m  is  of  the  same  size  as 
but  for  the  small  values  of  m  that  we  will  use  in  practice,  4>{m)  is  roughly  10%-50%  smaller  than  m.  We  associate 
K  with  the  set  of  rational  polynomials  in  X  of  degree  less  than  N,  with  multiplication  and  addition  defined  modulo 
We  lef  fhe  ring  of  integers  of  K  be  denoted  by  Ok  =  ^[Cm]- 

We  now  fix  a  prime  p,  which  is  neither  ramified  in  K,  nor  an  index  divisor  (i.e.  p  does  not  divide  m).  Consider 
the  reduction  of  K  at  p;  we  define 

Ap:=Zp[A]/<?^(A) 

to  be  the  ring  of  polynomials  over  Zp  where  multiplication  and  addition  are  defined  modulo  and  p.  Note,  we 
assume  that  the  representation  of  Ap  is  such  that  the  coefficients  are  given  in  the  range  (—p/2, p/2].  In  general  Ap 
is  not  a  field  but  is  an  algebra,  since  <Prn  is  generally  not  irreducible  mod  p. 

Since  p  is  neither  an  index  divisor  nor  ramified,  and  because  K/Qis  Galois,  we  have  that  the  polynomial 
splits  mod  p  into  £  distinct  factors  Fi{X),  each  of  degree  d,  where  i  ■  d  =  (j){m).  We  then  have  that 

Ap  ^  Zp[X]/Fo{X)  X  ...  X  Zp[X]/Fe-i{X) 

=  Lq  X  . . .  X  L£_i  =:  Ap. 

i.e.  the  reduction  of  K  modulo  p  is  isomorphic  to  i  copies  L,  =  Zp[A]/Fj(X)  of  Fpd.  Since  all  finite  fields  of  a 
given  degree  are  isomorphic,  each  of  these  copies  of  Fpd  is  isomorphic  to  each  other.  Note  we  let  Ap  denote  the 
representation  of  the  algebra  by  polynomials  modulo  Fm  and  Ap  denote  the  algebra  by  a  set  of  I  copies  of  the 
fields  defined  by  the  polynomials  Fi{X). 

We  note  there  is  a  natural  homomorphic  inclusion  maps  Ap  — >  Ok  defined  by  mapping  Ap  to  the  coset 
representative  with  coefficients  in  (— p/2,p/2].  If  a  G  Ok  then  we  let  a  mod  p  denote  the  inverse  in  Ap  under 
this  inclusion.  If  q  is  a  prime  greater  than  p  then  we  can  also  consider  elements  of  Ap  as  elements  in  Ag  but 
this  inclusion  is  not  a  homomorphism  (since  it  only  preserve  the  arithmetic  operations  “as  long  as  there  is  no 
wraparound”). 

We  will  use  Ap  (resp.  Ap)  in  two  distinct  ways.  In  the  first  way  we  use  Ap  and  Ap  to  describe  the  message 
space  of  our  scheme;  in  this  case  we  take  p  to  be  small  (think  p  =  2,  or  a  32-bit  prime).  In  the  second  way,  we  use 
Ag  (for  a  large  prime  q)  as  an  approximation  of  the  global  object  A.  Looking  ahead  the  basic  construction  is  that 
we  take  an  element  a  G  Ap,  then  form  the  element  in  Ag  given  by  a  -|-  p*  •  r,  where  r  is  referred  to  as  the  noise. 
Public  operations  are  then  performed,  and  these  will  correspond  to  valid  operations  in  Ap  only  if  the  noise  term 
does  not  become  too  large  (in  the  sense  of  the  oo-norm  of  the  noise  becoming  bigger  than  q/2).  If  the  operation  is 
does  not  result  in  wrap-around  then  we  can  (upon  decrypting)  obtain  the  plaintext  in  Ap. 

C.2  Underlying  Plaintext  Algebra 

Each  message  in  Ap  actually  corresponds  to  I  messages  in  Fpd  =  Zp[A]/Fj(X).  We  call  each  of  these  components 
a  “slot”.  By  the  Chinese  Remainder  Theorem,  additive  and  multiplicative  operations  in  Ap  correspond  to  SIMD 
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operations  on  the  slots.  However,  in  many  applieations  we  will  be  interested  in  plaintexts  where  eaeh  slot  lies  in 
¥pn,  for  some  n  dividing  d.  (In  partieular  this  ineludes  the  important  ease  of  n  =  1.)  In  addition  an  applieation 
may  have  a  preferred  representation  (i.e.  preferred  polynomial  basis)  for  the  underlying  field,  Fpn. 

We  therefore  fix  (or  are  given)  an  irredueible  polynomial  G{X)  G  hp[X],  of  degree  n,  whieh  defines  fhe 
speeific  polynomial  basis  we  are  inferesfed  in;  we  lake  G{X)  =  X  —  1  when  n  =  1.  To  fix  nolafion  we  define 
Kn  =  'Lp\X\IG{X)  lo  denote  one  eopy  of  Ibis  degree  n  field,  wilh  fhe  given  polynomial  represenfalion. 

Nole,  in  appliealions  one  is  given  p  and  n,  and  Ihen  one  needs  lo  find  values  of  m  whieh  enable  fhe  above 
represenfalion.  Basie  algebra  shows  us  lhal  <Pm{X)  will  have  a  degree  d  faelor  if  and  only  if  m  divides  —  1. 
Thus,  given  p  and  n,  we  need  lo  seleel  m  sueh  lhal  for  some  value  d  =  s  •  n,  we  have  m  divides  —  1.  The  value 
i  is  given  by  (j){m)/d. 

For  eaeh  of  our  fields  Lj  =  'Lp[X]/  Fi{X)  Ihere  will  be  a  dislinef  homomorphie  embedding  of  ]K„  info  L,  whieh 
we  will  denole  by  ^n,i,  whieh  will  be  an  isomorphism  in  fhe  ease  when  n  =  d.  Our  basie  plainlexl  spaee  will  now 
be  defined  as  £  eopies  of  i.e.  Ai  =  (IK„)^,  where  addilion  and  mullipliealion  will  be  defined  eomponenl-wise. 
We  Iherefore  ean  define  a  map 

^n-  { 

\  (mo, . . .  ,m£_i)  I — >  . . . , 

By  applying  fhe  Chinese  Remainder  Theorem  given  an  elemenl  a  G  Ap  we  ean  oblain  a  value  a  G  A^;  we 
wrife  a  =  CRTp(a).  Note,  our  use  of  nofalions:  Elemenfs  in  Ap  and  Bp  will  be  represented  by  lower  case  Greek 
letters;  elemenfs  in  Ap  and  Ai  will  be  represented  by  bold  face  roman  letters  (since  Ihey  are  veclors);  and  elemenfs 
in  Kn  and  Lj  will  be  represented  by  sfandard  lower  case  roman  letters. 

We  end  fhis  discussion  of  fhe  plainfexf  space  by  nofing  fhaf  fhere  is  a  simple  operafion  fhaf  produces  fhe 
projecfion  map.  If  we  consider  fhe  elemenf  m  G  Ap  which  is  defined  by  fhe  elemenf  in  Ap  given  by  fhe  i  unif 
vector  e*.  Then  if  m  =  (mo, . . . ,  rng-i)  G  Ap  fhaf  vrj  •  CRTp(m)  =  CRTp(0, . . . ,  0,  m*,  0, . . . ,  0).  From  tt^  we 
can  also  define  a  projecfion  on  an  arbifrary  subsef  7  C  {0,  1}  in  fhe  obvious  way;  by  defining  tt/  to  be  fhe 

elemenf  Xlie/  CRTp(ej). 

C.3  Galois  Theory  of  Cyclotomic  Fields 

The  field  K  =  Q(Cm)  is  abelian  (i.e.  has  abelian  Galois  group)  and  has  Galois  group  given  by  ^al(7f/Q)  = 
(Z/mZ)*.  If  we  fhink  of  X  in  fhe  represenfafion  of  K  as  denofing  a  generic  mfh  roof  of  unify  C,m,  then  given  an 
elemenf  i  G  {'LirnL)*  fhe  associafed  elemenf  of  fhe  Galois  group  is  given  by  fhe  mapping  k*  :  A  A®. 

We  now  need  to  consider  how  fhe  Galois  group  ^al(77/Q)  works  when  we  consider  K  modulo  p,  to  Ap  and 
Ap.  Notice,  fhaf  since  Ap  is  nof  a  field  fhe  usual  fheorems  of  Galois  Theory  do  nof  apply  (an  obvious  facf  buf  worfh 
sfafing).  The  maps  defined  by  fhe  Galois  group  commute  with  our  functions  Tg,  and  CRTp  etc.  Thus,  to  fix  ideas, 
consider  an  elemenf  m  =  (mo, . . . ,  m£_i)  G  Ad  =  We  obfain  fhe  corresponding  elemenf  in  Ap  by  applying 
a  =  CRTp(<7'„(m))  G  Ap.  Now  if  we  apply  fhe  elemenf  m  from  ^al(7T/Q)  to  fhe  elemenf  a  we  obfain  an  elemenf 
/3  such  fhaf  fd  =  CRJ p{Fn{Ki{mi) , . . . ,  Ki{me))),  where  Ki{mj{X))  =  mj{X^)  (mod  G{X)). 

Considering  how  automorphisms  work  on  Ap,  if  is  well  known  fhaf  any  field  ¥pk  has  Galois  group  over  Zp 
given  by  fhe  cyclic  group  Gk  of  order  k.  Now  since  Ap  confains  fhe  subfield  we  have  fhaf  ^al(7T/Q)  confains 
fhe  cyclic  subgroup  <\  {'LlmL)* .  The  group  Gd  is  called  fhe  decomposition  group  of  a  prime  ideal  lying 
above  p  in  K.  The  group  Cd  is  generafed  by  fhe  elemenf  p  G  (Z/mZ)*,  which  corresponds  fo  fhe  Frobenius  map 
Kp  :  X  X'P.  In  whaf  follows  we  lef  Q  denote  fhis  subgroup  Gd  of  [T^lmL)* 

Considering  how  ^al(iT/Q)  acfs  on  we  notice  fhaf  fhe  Galois  group  of  over  Zp  is  given  by  Gn  — 
GdjGdin  generated  by  fhe  Frobenius  map.  The  key  difference,  befween  and  Kd,  being  fhaf  fhe  map  Kpn 
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is  the  identity  on  the  subfields  If  we  want  to  restriet  to  the  Galois  group  of  ]K„  we  let  Q  denote  the  subset 
eonsisting  of  a  set  of  representatives  for  the  Galois  group  of  ]K„. 

Sinee  {IjIrriL)*  is  abelian  all  subgroups  are  normal,  and  henee  we  ean  define  quofienf  groups,  and  so  we  define 
Ti  fo  be  fhe  quofienf  group  {1a j mJj)*  j Q ,  nofe  Ti  has  order  1.  We  wrife  as  a  produef  of  eyelie  groups  x  Cm 
wifh  m  dividing  nj+i.  As  a  sef  of  eosef  represenfafives  for  Ti  we  firsf  piek  a  eosef  represenfafive  hi  for  Cm,  and 
fhen  as  fhe  eosef  represenfafives  of  all  ofher  elemenfs  we  lake  Ihose  elemenls  in  {llml)*  given  by 

t 

/i®'  for  0  <  Ci  <  Ui- 

i=l 

Thus  we  ean  idenlify  Tt  wifh  a  subset  of  {1/ml)* . 

If  we  label  fhe  roofs  of  (Pm  in  K  by  to  then  it  is  a  standard  faet  that  the  Galois  group  aets 

transitively  on  these  roots.  The  subgroup  Q  aets  on  these  roots,  and  we  ean  partition  the  set  of  roots  into  disjoint 
sets  with  respeet  to  the  group  aetion  of  Q.  That  is  we  ereate  £  =  (j){m)/d  subsets  eaeh  of  d  elements,  we  label 
these  subsets  Xq,  . . . ,  X£_i.  Sinee  ^al(iT/Q)  aets  transitively  on  the  set  {Cm\  •  •  • ,  Cm  the  quotient  group 

TL  =  ^al(iT/Q)/^  aets  transitively  on  the  set  Xq,  . . . , 

Sinee  Q  was  the  deeomposition  group  of  p  the  sets  Xi,  eaeh  eontaining  d  eomplex  roots,  when  redueed  modulo 
p  ean  be  plaeed  in  eorrespondenee  with  the  roots  of  Fi{X),  i.e.  one  of  the  faetors  of  modulo  p.  We  need  to  fix  a 
represenfafive  for  for  eaeh  sef  Xi  mod  p.  Fixing  a  represenfafive  for  Xi  mod  p  means  essenlially  fixing  a  roof  of 
Fi{X)  modulo  p;  and  one  ean  fhink  of  fhe  symbolie  roof  X  being  sueh  a  roof  wifh  all  ofher  roofs  being  given  by  a 
polynomial  in  X  modulo  p  of  degree  less  fhan  d  —  1  (when  redueed  arifhmefie  is  eonsidered  modulo  Fi{X)).  Sinee 
Ti  has  order  I  and  aefs  fransifively  on  {Xq,  . . . ,  X£_i},  for  eaeh  i  G  {0,  —  1}  fhere  is  exaefly  one  elemenf 

(Ti  in  Ti  whieh  sends  0  fo  i.  If  we  fix  fhe  represenfafive  of  fhe  sef  Xq  fo  be  Cm^  then  to  define  fhe  represenfafive  of 
fhe  sef  Xi  we  lake  ai  £  7i  and  sef  fhe  represenfafive  of  Xi  fo  be  o'i{(m^).  Sinee,  defining  a  represenfafive  of  Xi 
essenlially  means  fixing  a  represenlalion  of  fhe  field  ly\X\l Fi{X)  Ibis  fhen  means  lhal  our  sef  of  represenfafives 
for  7i  ael  “fransifively  on  fhe  plainlexl  slols”  in  fhe  following  sense:  For  eaeh  pair  G  —  l}we  have 

lhal 

aj{a~^  {CRT  p{Fn{0, . . . ,  0,  mi,  0, . . .  ,0))))  =  CRJp{Fn{0,.. .  ,0,m^  ,0, . . .  ,0)). 

for  some  infeger  t.  In  fhe  ease  n  =  1  we  have  m^  =  mj  and  so  our  sef  of  represenfafives  for  H  aef  direelly  as 
permulalions  on  fhe  slols. 

Our  main  leehnieal  eonlribulion  in  bolh  praelieal  and  Iheorelieal  terms  to  FHE  is  based  on  the  properties  of  the 
group  Tt  and  how  it  aets  on  the  plain  text  slots.  It  is  elear,  sinee  Tt  aets  transitively  as  above  and  we  have  projeetion 
maps,  that  we  ean,  given  a  veetor  of  slots  (mo, . . . ,  m^-i)  G  map  it  to  an  arbitrary  permutation  of  the  slots. 
The  naive  algorithm  for  this,  eonsisting  of  projeeting  eaeh  element,  mapping  via  Tt  as  above,  making  sure  we  eope 
with  the  possibility  of  powering  by  Frobenius,  and  then  reeombining  via  addition,  has  eomplexity  0{£).  In  Seetion 
3  we  showed  that  an  arbitrary  permutation  on  the  slots  ean  be  realized  in  0{t  ■  iogtj  operations,  where  t  is  the 
number  of  eyelie  eomponents  of  the  group  Tt,  note  t  =  O(log^).  That  this  algorithm  ean  be  applied  in  our  ease 
should  be  immediate,  but  to  fix  ideas,  we  examine  how  Tt  aefs  on  fhe  slols  when  Tt  is  eyelie;  and  how  fo  eonslruel 
our  offsel  swaps  in  Ibis  ease. 


When  TL  is  cyclic  \fTt  =  {h)  is  cyclic  we  can,  by  fixing  on  a  given  value  of  Fq{X),  reorder  fhe  factors  Fi{X) 
so  lhal  fhe  faclors  are  precisely  Ihose  faclors  corresponding  to  Thus  we  can  consider  Tt  as  defining  permu¬ 

lalions  on  fhe  factors  of  ^rn  modulo  p.  Allhough  Tt  is  rarely  cyclic  Ibis  case  is  illuslralive  of  whal  is  occurring, 
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and  in  practice  we  can  often  restrict  the  number  of  slots  to  correspond  to  the  largest  cyclic  subgroup  of  Ti.  *  We 
consider  three  examples  of  increasing  complexity: 

Example  1:  The  simplest  case  to  understand  is  when  the  decomposition  group  is  trivial,  i.e.  d  =  1.  Consider  the 
case  of  m  =  11  and  p  =  23,  we  have  that  the  polynomial  <Prn{X)  factors  into  ten  linear  factors  modulo  23,  and 
the  Galois  group  {'L/rnL)*  is  cyclic  of  order  10  and  generated  by  the  element  2.  Since  Q  =  (1)  we  take  using  the 
procedure  above  Ti  =  {'L/mL)*  =  (2).  Thus  we  have  ten  slots  and  we  order  them  such  that  we  have 

K2(CRTp(tp'„(mo,m2, . .  ■,mg)))  =  CRTp(lPn(mg,mo,m2,  ■  ■  •  ,^8)). 

Hence  K2  produces  a  cyclic  shift  of  the  slots.  If  we  wish  to  switch  elements  in  positions  i  and  j,  for  i  <  j,  the  we 
only  need  to  apply  the  following  operation 

=  K23-i{7Ti  ■  O')  +  K2i-j{Trj  ■  a)  +  vr|0,...,9}\{jj}  •  O'. 

Example  2:  To  see  what  happens  for  non-trivial  decomposition  groups  we  consider  the  case  of  m  =  31  and  p  =  2. 
We  have  since  2®  =  1  (mod  31)  that  the  decomposition  group  at  p  is  cyclic  of  order  5,  i.e.  d  =  5.  In  this  example 
we  find  that  by  ^al  factors  directly  into  the  product  of  ^  =  (2)  and  the  cyclic  subgroup  (6).  The  set  of  coset 
representatives  for  Ti  we  can  take  to  be  this  subgroup  (6),  thus  we  can  identify  Ti  with  a  subgroup  of  Qal.  This 
implies  that  the  elements  in  H  act  as  direct  permutations  on  the  slots,  and  we  do  not  need  to  worry  about  the  action 
of  Erobenius.  In  particular  we  can  define  the  six  slots  so  that  we  have,  for  a  specific  represenfafion  of  Kn  =  F25, 


Ke{CRJ2{'l'n{rno,mi,m2,m3,m4,rn5)))  =  CRT2(!P'n(m-5,  "12,  m3, 1714)). 

If  we  wish  fo  shifl  fo  fhe  lefl  we  lake  Ihe  elemenls  in  Qa\{K/^)  given  by  1/6*  (mod  m),  so  for  example  since 
1/6  =  26  (mod  31)  we  have 


K26(CRT2(!^'n(mo,mi,m2,m3,m4,m5)))  =  CRT2(!^'n(mi,  m2,  m3, 1774, 1775,  mo)). 

If  we  wish  fo  swilch  elemenls,  for  an  elemenl  a  G  Ap,  in  positions  i  and  j,  wilh  i  <  j,  Ihen  we  apply  fhe  following 
operation 

swapjj(Q;)  =  KQj-i(TTi  ■  a)  +  KQi-j^TTj  ■  a)  +  7r|o,...^5}\{ij}  •  O'. 

Example  3:  The  above  example,  in  which  Tt  could  be  idenlified  wilh  a  subgroup  of  Qa\  is  nol  typical.  In  Ihe 
general  case  we  have  Ihe  added  complication  of  dealing  wilh  actions  of  Erobenius  on  applying  automorphism 
corresponding  to  elemenls  in  Tt.  We  examine  this  more  general  situation  via  means  of  an  example.  We  make 
m  =  257  and  p  =  2.  In  this  case  we  find  lhal  2  has  order  16  modulo  m,  and  lhal  Ihe  quolienl  group  Tt  =  Qa\(  {2) 
is  cyclic  of  order  16.  We  also  find  lhal  Ihere  is  no  cyclic  subgroup  of  order  16  of  Qa\  which  is  nol  equal  to  (2). 
Thus  Tt  cannol  be  represented  as  a  subgroup  of  Qa\. 

We  instead  represenl  Tt  by  Ihe  sel  of  cosel  represenlalives  given  by  3*  mod  m,  for  i  =  0, . . . ,  15.  Since 
3®  mod  m  =  136  0  (2),  whilsl  3^®  mod  m  =  249  =  2^^  mod  m.  We  Iherefore  have  16  slols,  each  consisting  of 
an  elemenl  in  ]K„  =  F2I6.  We  fix  a  specific  represenfafion  of  each  slol  so  lhal 

«:3(CRT2(l^„(mo,mi, . . .  ,mi4,mi5)))  =  CRJ2iT'n{rnll\rno,rni, . . .  ,mi3,mi4)). 

*  For  implementation  purposes  restricting  the  slots  in  this  way  is  simpler,  although  for  our  asymptotic  result  on  FHE  with  polylog  overhead 
we  will  require  to  consider  the  whole  of  Ti. 
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However,  we  also  have 


K86(CRT2(liV(rno,mi, . . .  ,mi4,mi5)))  =  CRT2(!P'n(w-i,  •  •  • ,  mis,  mi4,  mis,  mg®)). 


Note  that  (1/3)  mod  m  =  86,  but  that  86  is  not  one  of  our  eoset  representatives  for  Ti. 

In  other  words  to  move  elements  to  the  right  (without  wrap  around)  by  i  plaees  we  apply  the  map 
but  to  move  elements  to  the  left  (without  wrap  around)  by  i  plaees  we  need  to  apply  the  map  Kg-i  Henee  if 

we  wish  to  switeh  elements,  for  an  element  a  G  Ap,  in  positions  i  and  j,  with  i  <  j,  then  we  apply  the  following 
operation 

SWapjj(Q!)  •  Cr)  +  mod  m(^i  '  ®) 

Henee,  although  the  underlying  algebra  is  different  when  Ti  eannot  be  identified  with  a  subgroup  of  ^al,  the  method 
to  obtain  a  swap  is  exaetly  the  same. 

These  examples  show  that  for  eyelie  groups  we  ean  realize  any  transposition  via  the  use  of  sealar  multiplieation 
by  the  vr/  and  applieation  of  maps  k,.  The  above  teehnique  also  allows  us  to  realize  the  offset  swaps  from  Definition 
1  for  any  subset  T  c5  =  {0,  1}  and  any  i.  The  following  teehnique  works  for  when  Ti  =  {h)  is  a  eyelie 

group  generated  by  h,  generalizing  to  other  groups  follows  from  our  methods  but  leads  to  more  eomplex  formulas. 
Reeall  that  a  permutation  vr  over  S  is  an  f-offset  swap  over  S  if  there  exists  a  subset  T  C  S  sueh  that  the  pairs 
{{t,t  +  i  mod  i)  :  t  ^  T}  are  disjoint  and  tt  simply  swaps  eaeh  pair  (leaving  the  other  elements  fixed). 

For  a  sef  A  we  let  A  +  i  =  {j  +  i  mod  i  :  j  £  A}  and  A  =  S  \  A.  We  also  splif  T  info  fwo  sefs  Tl  and  Tr 
sueh  fhaf  t  £  Tl  if  and  only  if  f  G  T  and  t  i  <  i.e.  Tl  is  fhe  sef  of  elemenfs  in  T  whieh  ean  be  shifled  fo 
fhe  lefl  by  i,  wifhouf  wrap  around.  Algebraieally  an  offsel  swap  on  an  elemenf  a  is  fhen  defined  in  terms  of  our 
isomorphisms  Ki  ete  as 

^ru(r+i)  ■  ®  ■  «)  +  l^il/hyi^TL+i  ■  ®)  +  '^{1/hY-ii'^Tii  ■  O')  +  '  O') 

The  first  term  eorresponds  to  those  elements  whieh  are  kept  fixed  by  fhe  offsef  swap,  i.e.  fhose  elemenfs  neifher  in 
T  nor  T  +  i.  The  seeond  ferm  eorresponds  fo  fhose  elemenfs  shiffed  fo  fhe  leff  by  i  wifhouf  wrap  around,  fhe  fhird 
eorresponds  fo  elemenfs  shiffed  fo  fhe  righf  by  i  wifhouf  wrap  around  by  i  wifhouf  wraparound,  fhe  final  fwo  terms 
deal  wifh  fhe  ease  of  wraparound. 


D  Using  mod-^^  Polynomial  Arithmetic 

Pari  of  our  goal  in  fhis  paper  is  fo  allow  implemenfafions  of  BGV-fype  erypfosysfems  over  rings  of  fhe  form 
'Ij[X]/<Pm{X)  for  arbifrary  integers  m,  nol  only  when  m  is  a  prime.  Allhough  mosl  of  fhe  underlying  algebra 
works  fhe  same  way  regardless  of  whal  m  is,  we  do  nol  have  a  good  bound  on  fhe  inerease  in  fhe  size  of  eoeffieienl 
veelors  when  using  mod-^m  arifhmelie. 

Reeall  lhal  for  every  ring  TZ  =  'L[X]/F{X)  Ihere  is  a  “ring-eonslanl”  7r,  sueh  lhal  for  all  a,b  £  TZit  holds 
lhal  ||a6||  <  7r  •  ||a||  •  ||6||,  where  ||x||  is  fhe  norm  of  fhe  eoeffieienf-veelor  of  x  (say,  fhe  (qo  norm).  However,  we 
do  nol  have  a  good  bound  on  fhe  “ring-eonslanl”  for  rings  of  fhe  for  TZm  =  Z[A]/<?m(A),  and  in  parlieular  Yllm 
ean  be  super-polynomial  in  m.  In  parlieular  YUm  relaled  fo  fhe  sizes  of  fhe  eoeffieienls  of  0m{X)  whieh  are 
known  fo  gel  ralher  large  [1].  In  our  eonlexl,  fhis  means  lhal  when  mulliplying  fwo  “shorl”  eipherlexls,  fhe  resull 
ean  be  “longer”  lhan  fhe  produel  of  fhe  fwo  by  fhis  faelor  YTZm  whieh  we  do  nol  have  a  good  bound. 
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D.l  Canonical  Embeddings  and  Norms 


To  analyze  a  cryptosystem  that  works  mod-<?m>  we  therefore  use  a  different  measure  of  “size”  of  polynomials: 
Rather  than  considering  the  norm  of  the  coefficient  vector  of  a  polynomial,  we  consider  the  norm  of  the  “canonical 
embedding”  of  that  polynomial:  For  an  integer  m,  let  Vm  be  the  set  of  complex  primitive  m-th  roots  of  unity.  Then 
for  a  polynomial  a  G  the  “canonical  embedding”  of  a  is  the  vector  of  values  that  a  assumes  in  all 

the  roots  in  Vm, 

£{a)  ■  fc  G  Z5!„\  ,  where  p  is  a  fixed  complex  primitive  m-th  root  of  unity  (e.g.,  p  = 


More  generally,  the  canonical  embedding  of  an  element  a  G  Q[X]/F(X)  consists  of  the  evaluations  of  a  in 
all  the  complex  roots  of  F.  Below  we  only  use  the  canonical  embeddings  for  the  cases  F{X)  =  <^m{X)  and 
F{X)  =  X"^  —  1.  Note  that  £{a)  is  in  general  a  vector  of  complex  numbers,  and  the  size  of  each  entry  in  that 
vector  is  the  norm  (absolute  value)  of  that  complex  number. 


Below  we  refer  to  the  norm  of  iS(a)  as  the  “canonical  embedding  norm”  of  a,  and  denote  it  by  ||a||“'^.  Although 
it  is  possible  to  define  fhe  “canonical  embedding  Ip  norm”  for  any  Ip,  below  we  always  refer  fo  fhe  canonical 
embedding  l^o  norm.  Namely, 


=  l|f{o)ll 


=  max  |a(p* 
k&Z* 


(Note  again  fhaf  in  Ibis  secfion  we  consisfenfly  use  ||  •  ||  fo  refer  fo  fhe  loo  norm  of  a  vecfor  and  nol  fhe  I2 
norm.)  We  exfend  fhe  canonical  embedding  norm  fo  vectors  over  Q\_X]/<l>m{X)  in  fhe  nafural  way,  namely  if 
a  =  (ao,  oi, . . . ,  On-i)  is  an  n-vecfor  over  Q[X\/^m{X),  fhen  ||a||“"'  =  maxi<„  ||ai||“'^. 

If  is  easy  to  see  fhaf  for  any  elemenf  a  G  Q[X]/<Pm{X),  fhe  canonical  embedding  norm  is  nol  much  more  fhan 
fhe  coefficienl  norm,  namely  ||a||'^“”  <  4>{m)  ■  ||a||  (where  ||a||  is  fhe  norm  of  a’s  coefficienl  vecfor).  This  follows 
since  each  of  fhe  m-fh  roofs  of  unify  has  norm  one,  and  we  are  adding  (j){m)  of  Ihem  wilh  coefficienls  bounded 
by  ||a||.  Clearly,  for  any  fwo  elemenfs  a,b  ^  'Z[X]/<Pm{X)  we  have  ||a  +  and  since 

fhe  primitive  m-fh  roofs  of  unify  are  all  roofs  of  ^m(X)  fhen  ||a6mod  =  ||o6||“”  <  ||a||“"'-  ||6||“"'. 

Similarly  for  n-vecfors  a,  b  G  {Q[X]/^rn{X))'^  we  gef  ||  (a,  b)  mod  <  n  ■  Hall^""  •  ||b||“”. 

Also,  for  every  m  fhere  exisfs  a  “ring  consfanf”  Cm  (which  is  areal  number)  such  fhaf  for  all  a  G  'Ij[X]/<Prn{X) 
if  holds  fhaf  ||a||  <  Cm  •  lloll^*^;  see  [6]  for  a  discussion  of  Cm-  Anofher  properly  of  fhe  canonical  embedding  norm 
fhaf  we  use  below,  is  fhaf  a  nonzero  inleger  polynomial  musl  have  norm  al  leasl  one: 


Lemma  10.  Let  a  G  h[X]/^m{X)  for  some  integer  m,  then  >  1. 

Proof.  Since  a  is  a  nonzero  integer  polynomial,  fhen  fhe  resull  of  fhe  complex  producl  Ofeez*  musl  be  a 

nonzero  inleger,  and  Iherefore  if  has  magnilude  al  leasl  1 .  If  follows  fhaf  some  of  fhe  lerms  in  fhe  producl  musl 
have  magnilude  1  or  more,  hence  fhe  loo  norm  of  £{a)  is  al  leasl  1.  □ 


Modular  Reduction  in  Canonical  Embedding.  To  falk  aboul  fhe  canonical  norm  of  elemenfs  in  'Lq[X]/^rn{X) 
(i.e.,  polynomials  reduced  bolh  mod  <L>m{X)  and  mod  q),  we  define  fhe  “canonical  embedding  norm  reduced 
mod  q”,  denoted  as  fhe  smallesl  norm  among  all  fhe  polynomials  fhaf  are  congruenl  fo  a  modulo  q. 

Namely,  for  a  G  'Z[X]/<Pm{X)  we  denote 

|a|“^  =  min{  ||6||“”  :  b  G  Z[X]/^m{X),  b  =  a  (mod  q)  }. 

(We  nole  fhaf  fhe  minimum  exisfs,  even  Ihough  we  lake  if  over  an  infinile  sel,  since  fhe  sef  {<5(6)  :  6  =  a  (mod  g)} 
is  a  cosel  of  a  lattice.)  Somelimes  we  may  wanl  fo  falk  aboul  fhe  specific  polynomial  where  fhe  minimum  is 
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obtained,  namely  the  polynomial  b  satisfying  b  =  a  (mod  q)  and  ||6||“"^  =  |a|“”.  If  this  polynomial  is  unique, 
then  we  eall  it  the  “eanonieal  reduetion  mod  q  of  a”  and  denote  it  by 

can  ,  r 

[a]g  =  argmin{  ||6||“"-  :  b  G  Z[X]/<Pm{X),  b  =  a  (mod  q)  }. 

We  stress  that  our  eryptosystem  never  needs  to  eompute  the  eanonieal  embedding  (or  the  eanonieal  reduetion, 
or  the  eanonieal  norm)  of  polynomials,  it  is  only  in  the  analysis  of  this  seheme  that  we  use  these  terms. 

Obviously,  for  any  element  a  G  'Z[X]/<Prn{X)  and  any  modulus  q,  the  redueed  eanonieal  embedding  norm 
is  not  more  than  the  eanonieal  embedding  norm,  namely  |a|p“”  <  ||a||'^“”.  Similarly,  it  is  easy  to  eheek  that  if 
c  =  ab  mod  q)  then  |c|g“"'  <  |a|“”  •  A  eorollary  of  Lemma  10  (that  we  use  in  our  analysis  of 

modulus  switehing)  is  that  an  element  with  small  enough  eanonieal  embedding  norm  must  be  the  unique  eanonieal 
reduetion  mod  q  of  its  eoset: 

Lemma  11.  Let  m,  q  be  integers,  and  let  a  G  'Is[X]/<P{m)  be  such  that  ||a||“”  <  q/2.  Then  for  any  b  G  h[X]/d>m 
such  that  b  a  but  b  =  a  mod  q,  it  holds  that  ||6||“”  >  ||a||“”  =  Hence  for  all  b  =  a  mod  q  we  have 

can 

a  =  [b]  q- 

Proof  Fix  any  b  G  Z[A]/c^m  sueh  that  6  /  a  but  b  =  a  mod  q.  Then  is  a  nonzero  integer  polynomial,  and  by 
Lemma  10  its  eanonieal  embedding  has  an  entry  of  magnitude  >  1.  This  implies  that  <S(6)  has  an  entry  of  distanee 
at  least  q  from  the  eorresponding  entry  in  S{a).  Sinee  that  entry  in  S{a)  has  magnitude  <  q/2,  then  the  one  in 
S{b)  must  have  magnitude  >  q/2,  and  therefore  ||6||“”  >  q/2  >  ||a||'^“”.  It  follows  that  a  has  the  unique  smallest 
eanonieal  embedding  norm  among  all  the  polynomials  in  its  eoset  mod  q.  □ 


D.2  Our  Cryptosystem 

In  terms  of  operations,  our  eryptosystem  is  almost  identieal  to  the  BGV  eryptosystem  [3],  where  all  the  operations 
are  done  modulo  d>m{X).  However,  our  analysis  of  (the  funetionality  of)  this  eryptosystem  is  somewhat  different, 
in  that  we  keep  traek  of  the  eanonieal  norm  of  “the  noise”  rather  than  the  norm  of  its  eoeffieient  veetor.  Speeifieally, 
we  maintain  the  invariant  that  if  c  is  a  eiphertext  enerypting  the  aggregate  plaintext  a  G  Zp[X]/^m(-A)  relative  to 
seeret  key  s  and  modulus  q,  then  in  the  ring  'LffX\/d>ra{fX')  we  have  the  equality 

(c,s)  =p-u  +  a  {mod  ^m{X),q),  (1) 

where  u  G  Z[X]/^m(-^)  has  small  eanonieal  norm  mod  q,  \u\q^^  <C  q. 

Decryption.  We  elaim  that  as  long  as  this  invariant  holds,  we  ean  use  s  to  deerypt  c.  This  ean  be  done  in  one  of 
two  ways: 

-  If  the  “ring  eonstant”  Cm  happens  to  be  small  enough  (i.e.,  mueh  smaller  than  q),  then  from  ||ri||“”  <C  q  and 
p  q  and  Cm  ^  q  eonelude  that  also  ||p  •  rt||  <  Cm  ■  P  ■  ||n||“"^  <C  q,  whieh  means  that  the  eoeffieient 
veetor  of  the  noise  has  small  norm  and  deeryption  works  just  as  in  standard  BGV  eryptosystems.  For  example 
for  prime  values  of  m  the  eonstant  Cm  is  equal  to  approximately  d/vr,  [6]. 

-  Otherwise,  we  “lift”  deeryption  to  work  modulo  V™  —  1  rather  than  modulo  d>m{X),  and  use  the  faet  that  the 
“ring  eonstant”  of  Z[A]/(X”^  —  1)  is  small  (namely,  it  is  ^/rn). 
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Describing  the  second  option  in  more  detail,  Lemma  12  below  tells  us  that  there  exists  an  integer  polynomial 
G  G  Z[X]/(X™  —  1)  such  that  G{a)  =  m  for  every  complex  primitive  m-th  root  of  unity  a,  and  G{/3)  =  0  for 
every  complex  non-primitive  m-th  root  of  unity  f5.  This  means  in  particular  that  G  =  m  (mod  (in  words, 

the  polynomial  G  reduces  to  the  constant  m  modulo  <Prn)- 

Computing  6  <—  G-  (c,  s)  mod  {X"^  —  l,q),  we  get  b  =  p-Gu+Ga  (mod  X^  —  1,  q),  due  to  Equation  (1).  We 
now  observe  that  the  evaluation  of  the  polynomial  Gu  in  all  the  m-th  roots  of  unity  must  be  small:  For  the  primitive 
roots  this  evaluation  is  only  m  times  that  of  u  (which  is  small  by  our  invariant),  and  for  the  non-primitive  roots  this 
evaluation  is  zero  (since  G  evaluates  to  zero  in  these  roots).  Therefore  the  canonical  norm  of  Gu  in  Z[2f]  / (X™  —  1) 
is  small  and  therefore  also  the  norm  of  its  coefficient  vector  is  small,  so  it  can  be  decrypted  as  in  standard  BGV 
cryptosystems.  Namely,  we  have  no  wraparound  so  setting  6'  <—  6  mod  p  we  have  b'  =  Ga  G  Z[X](X™  —  1).  If 
we  now  further  reduce  modulo  <^m{X),  b"  <—  b'  mod  (Pm,  we  get  b  =  m  ■  a  G  'Z[X]/^rn{X)  (because  G  =  m 
(mod  <Pm{X)).  Finally  we  can  multiply  by  (m“^  mod  p)  to  get  a  =  m~^  ■  b"  mod  p. 

Lemma  12.  For  any  integer  m  there  is  an  integer  polynomial  Gm  of  degree  <  m  —  1,  such  that  Gm{oi)  =  m 
for  every  complex  primitive  m-th  root  of  unity  a,  and  Gm{P)  =  0/or  every  complex  non-primitive  m-th  root  of 
unity  p.  Moreover  the  Euclidean  norm  of  Gm ’s  coefficient  vector  is  sj m  ■  <j){m). 

Proof  Clearly  there  exists  a  complex  polynomial  of  degree  <  m  —  1  which  evaluates  to  m  in  the  primitive  m-th 
roots  of  unity  and  to  zero  in  the  non-primitive  m-th  roots  of  unity.  We  only  need  to  show  that  this  polynomial  has 
integer  coefficients,  and  that  it  has  a  low-norm  coefficient  vector. 

To  show  that,  let  D  be  the  m  x  m  DFT  matrix  (i.e.,  the  Vandemonde  matrix  on  complex  m-th  roots  of  unity, 
Dij  =  for  some  fixed  primifive  m-fh  roof  of  unify  p).  Denofe  fhe  coefficienf  vector  of  G  by  gi,  and  fhe  vecfor 
of  values  fhaf  if  assumes  in  all  fhe  m-fh  roofs  of  unify  by  v  (so  r?  is  a  vecfor  of  m’s  and  O’s),  and  we  have  v  =  Dg. 
Recalling  fhaf  fhe  inverse  of  D  is  D~^  =  D* /m  (wifh  D*  fhe  conjugate  franspose  of  D),  and  considering  fhe  0-1 
vector  v'  =  v/m,  we  have  fhaf  g  =  D*v' .  Each  coefficienf  in  G  is  fherefore  a  0-1  combinafion  of  fhe  enfries  in  one 
row  of  D*,  wifh  fhe  I’s  in  fhe  posifions  corresponding  fo  fhe  primifive  roofs  of  unify.  Specifically,  fhe  coefficienf 
of  in  G  is  pj  =  where  fhe  sum  goes  over  all  indexes  i  G  ZJ^.  Since  fhe  sum  is  symmefric  over  fhe 

primifive  roofs  of  unify,  fhen  if  musf  sum  fo  an  infeger.  Hence  G  musf  be  an  infeger  polynomial. 

Finally,  recall  fhaf  fhe  mafrix  D*  is  orfhogonal  wifh  rows  of  norm  yTn,  hence  fhe  I2  norm  of  g  is  ^/m  limes 
fhe  /2-norm  of  v' .  Since  fhe  number  of  I’s  in  v'  is  exacfly  (j){m),  fhen  fhe  I2  norm  of  v'  is  4>{m),  and  fherefore 
fhe  I2  norm  of  g  is  mf{m).  □ 

Having  described  decryplion,  we  now  proceed  to  describe  all  fhe  ofher  elemenfs  of  our  crypfosyslem,  namely 
key-generafion,  encrypfion,  addifion,  “raw  mulfiplicafion”,  key-swifching,  modulus  swifching,  and  Galois  group 
acfions.  All  fhese  componenfs  (bar  fhe  lasf)  are  very  similar  to  fheir  counferparf  in  fhe  BGV  crypfosyslem  [3],  bul 
Iheir  analysis  is  slighlly  differenl. 

Key  Generation.  The  paramelers  of  fhe  scheme  include  fhe  infeger  m  (fhaf  defines  fhe  polynomial  Fm),  the 
integer  p  (that  defines  the  aggregate  plaintext  space  Zp[X]/(Pm),  and  the  sequence  of  moduli  qo  >  qi  >  •  •  •  >  qi- 

Key  generation  is  as  in  the  ring-FWE-based  version  BGV  [3]  over  the  ring  Z[x]/t^m-  That  is,  for  appropriate 
N  =  poly  log  ( go,  "i),  one  chooses  Sq,  eo,i, . . .  eo,Ar  G  Z[X]/^m  (with  l^o  coefficient  norm  <C  go)  as  well  as 
a  random  elements  ao,i,  •  •  ■■,Oio,N  and  computes  Po,i  ^  aoffio  +  p  •  eo,*  mod  {Fm{X),qo). 

The  level-0  secret  key  is  so  =  [1 , 5o] ,  and  the  corresponding  public  encryption  key  includes  the  vectors  bj  = 
[Po,i,  — ao,i]- 

In  addition  to  these  keys,  the  key-generation  procedure  chooses  other  secret  key  vectors  for  the  other  levels, 
and  generates  the  key-switching  matrices  between  them,  as  described  in  Section  D.2  below. 
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Encryption.  Encryption  is  as  in  BGV.  An  aggregate  plaintext  a  G  Zp[X]/^m(^)  is  encrypted  by  choosing 
random  short  elements  ri, . . .  tat  G  Z[X]/c^m  (with  l^o  coefficient  norm  <C  qo)  and  setting 

N 

c  =  [co,ci]  ^  [a,0]  +  •  bj  mod  {<Prn{X) ,  qo) .  (2) 

i=l 

(Actually,  the  r^’s  can  be  chosen  as  elements  of  Z[x]/<^m  with  0/1  coefficients,  versus  merely  being  short.) 

It  is  easy  to  show  that  semantic  security  reduces  to  the  hardness  of  the  decision  ring-LWE  problem  for  the  ring 
7iq[X]/(Prn  and  the  distributions  used  to  sample  the  short  elements. 

To  see  that  our  invariant  holds  with  respect  to  the  level-0  secret  key  sq  and  freshly  encrypted  ciphertexts,  note 
that  Equation  (2)  implies  that  c  =  [a,  0]  +  A  •  bj  (mod  (Prn{X),  qo),  and  therefore 

N  N 

(c,so)  =a  +  ^ri(so,bj)  =  a  +  p-'^n-  ei 
i=\  i=l 

N 

=  a  +  p-'^Ti-ei  (mod  ^rn{X),qo) 

i=l 

and  the  since  all  the  n’s  and  e^’s  are  small  (and  therefore  also  have  small  canonical  embedding  norm),  then  the 
canonical  embedding  norm  of  the  polynomial  u  =  A  '  G  mod  {^m{X),  qo)  is  small. 

Addition.  Adding  two  ciphertext  vectors  that  are  defined  wifh  respecf  fo  fhe  same  secref  key  and  modulus  is  jusf 
sfandard  addifion  in  'Lq[X]/<^rn{X).  Clearly,  if  (c,  s)  =  p  •  u  +  a  and  (c',  s)  =  p  ■  u'  +  a'  fhen  also  (c  +  c',  s)  = 
p  ■  {u  +  u')  +  {a  +  a'),  and  fhe  canonical  embedding  norm  of  tt  +  u'  is  sfill  small. 


“Raw  Multiplication”.  As  in  the  BV/BGV  family  of  cryptosystems  [5,4, 3],  “raw  multiplication”  of  two  cipher- 
text  vectors  (defined  with  respect  to  the  same  modulus)  is  done  using  tensor  product.  Namely,  if  we  have  ciphertext 
vector  c  which  is  decrypted  to  a  under  s  and  q,  and  another  vector  c'  which  is  decrypted  to  a'  under  s'  and  q,  then 
we  set  c  =  vector(c(8)c')  mod  {<Pm{X),q)  (where  vector(-)  opens  the  matrix  into  a  vector  using  some  appropriate 
ordering).  Denoting  s  =  vector(s  (g)  s')  mod  {(Prn{X),  q),  we  thus  have 

(c,  s)  =  S*(c  (g)  c')s'  =  (c,  s)  •  ^c',  s') 

=  {p  •  u  +  a)  ■  {p  ■  u'  +  a')  =  p  ■  {puu'  +  ua'  +  au')  +  aa'  (mod  <Pm{X),q). 

Since  the  canonical  embedding  norm  of  ft  =  puu'  +  ua'  +  au'  mod  (^m(X),  q)  is  still  small,  it  means  that  c  is  a 
valid  ciphertext  with  respect  to  s  and  q,  which  is  decrypted  to  aa' . 


Key  Switching.  A  crucial  component  of  the  BV/BGV  cryptosystems  is  the  ability  to  translate  a  ciphertext  with 
respect  to  one  secret  key  into  a  ciphertext  that  decrypts  to  the  same  thing  under  another  secret  key.  This  is  used,  for 
example,  to  translate  the  “extended  ciphertext”  that  we  get  from  raw  multiplication  back  to  a  normal  ciphertext,  or 
to  translate  two  ciphertext  vectors  with  respect  to  different  keys  into  ciphertexts  with  respect  to  the  same  key,  so 
that  they  can  be  added  or  raw-multiplied. 

Eet  s  be  a  secret-key  vector  over  'Lq[X]/<^rn{X),  and  consider  another  2-element  secret-key  vector  t  G 
{'Lq[X]/<^rn{X))‘^  whosc  first  entry  is  1.  To  allow  translation  from  s-ciphertexts  to  t-ciphertexts,  we  first  en¬ 
code  s  in  a  redundant  manner  by  computing  2*s  mod  (7  for  i  =  0,1,...,/  =  [log  q)  and  concatenating  all  these 
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vectors  to  form 


s  =  Powersof2q(s)  =  [s  |  2s  |  4s  |  . . .  |  2^s]  mod  q. 

Then  we  choose  a  random  low  coefficient  norm  vector  v  over  'Lq[X]/<l>m{X)  of  the  same  dimension  as  s  (call 
this  dimension  d),  and  a  matrix  R  G  which  is  chosen  at  random  from  the  orthogonal  space  to  t, 

namely  ti?  =  0  (mod  q).  The  key-switching  matrix  from  s  to  t  is  then  set  as 

W  =  W[s^t]=  +  R  mod  {<Pm{X),q) 

Again  it  is  easy  to  show  that  if  decision  ring-LWE  is  hard  for  the  ring  and  the  distributions  used  to 

sample  t  and  v,  then  the  matrix  W  above  is  pseudo-random,  even  for  someone  who  knows  s. 

Given  a  ciphertext  vector  c  (over  hq[X] / that  satisfies  our  invarianf  wifh  respecf  fo  s  and  q,  we  use 
W  fo  franslafe  if  info  anofher  vecfor  c'  fhaf  safisfies  our  invarianf  wifh  respecf  fo  t  and  q,  as  follows:  Firsf,  for 
i  =  0, 1, . . . ,  (  =  [logg]  we  denote  by  c,  fhe  vector  over  X2[X] / (Prn{X)  confaining  fhe  zTh  bifs  from  all  fhe 
coefficienls  of  all  fhe  enfries  of  c.  Namely: 

Co  =  c  mod  2,  and  Cj  =  2“*  •  ((c  mod  2*“'“^)  —  2-^Cj)  for  i  >  0. 

j<i 

Then  fhe  bif-decomposifion  of  c  is  fhe  concafenafion  of  all  fhese  vecfors, 

c  =  BitDecomp(c)  [cq  |  ci  |  ...  |  c/]. 

Clearly  c  has  low  norm  coefficienl  vectors,  since  fhey  are  all  0-1  vectors,  and  we  have  (c,  s)  =  (c,  s)  over  Zg[X] 
(and  Iherefore  also  over  'Zq[X]/<Prn{X)).  Swifching  keys  from  s  to  t  is  done  simply  by  setting  c'  <—  Wc  mod 
(^m(X),  q).  To  see  fhaf  fhis  mainfains  our  invarianf,  assume  fhaf  for  some  a  G  we  have  (c,  s)  = 

p  ■  u  +  a  (mod  <Pm{X),  q),  where  u  has  low  canonical  embedding  norm.  Then: 

(c',t)=tlEc  =  t  c  =  (c,s) -hp  •  (c,  v) 

=  (c,s) -hp- (c,v)  =  p  •  (n -h  (c,  v)) -ha  (mod  g), 

' - " 

u' 

where  Equality  (a)  holds  since  fhe  firsl  enfry  of  t  is  1,  and  Equality  (6)  follows  from  (c,  s)  =  (c,  s).  Finally,  since 
bofh  V  and  c  have  low  canonical  embedding  norm  (because  fhey  have  low  coefficienl  norm),  Ihen  so  has  (c,  v) 
and  Iherefore  also  u'  =  (c,  w)  +  u  mod  {<Prn{X),q). 

Galois  Group  Actions.  Recall  fhaf  a  Galois  group  acfion  is  obfained  by  applying  fhe  Iransformalion  f{X)  ^ 
/(X*)  mod  (^m(X) ,  q)  for  some  i  G  ZJ^  to  all  fhe  polynomials  in  our  cipherfexf  vectors,  secref  keys,  efc.  Assume 
fhaf  we  have  (c,  s)  =  p  ■  u  +  a  (mod  <Prn{X),  q),  and  define  as  whal  you  gel  by  applying  fhe 

above  Galois  group  acfion  to  c,  s,  u,  a,  respectively.  Our  invarianf  means  fhaf  for  some  polynomial  k  G  Zq[X]  we 
have 

J2cj{X)sj{X)  =  p  ■  u{X)  +  a{X)  +  k{X)<Pm{X)  (equality  in  ZJX] ) ,  (3) 

j 

and  Iherefore  also  for  every  i 

^Cj(X*)sj(X*)  =  p  •  n(X*)  +  a(X*)  +  A:(X*)^^(X*)  (equality  in  ZJX]).  (4) 

j 
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Equation  (4)  follows  since  the  two  sides  of  Equation  (3)  are  identical  as  formal  polynomials  over  Zg,  and  therefore 
they  must  coincide  also  as  functions  over  any  characteristic- q  field.  It  follows  that  the  functions  on  both  sides  of 
Equation  (4)  must  also  coincide  over  any  characteristic-^  field,  and  Iherefore  fhe  fwo  sides  musf  be  idenfical  as 
formal  polynomials  over  Zg. 

Recalling  fhaf  if  f  G  ZJ^  fhen  ’PiX)  divides  P{X^),  we  obfain 

^  Cg(X*)sg(X*)  =  p  ■  n(X*)  +  a(X*)  =  p  ■  +  a«  (mod  Pm{X),q), 

j 

as  needed.  Observing  fhaf  for  i  G  ZJ^  fhe  canonical  embeddings  of  u  and  are  jusf  a  permufafion  of  each  ofher 
(and  hence  have  fhe  same  norm)  we  deduce  fhaf  our  invarianf  in  mainfained  under  fhe  fransformafion  X  ^  X'^ 
whenever  i  G  Z*  . 


Modulus  Switching.  Our  modulus  swifching  procedure  works  exacfly  as  in  fhe  BGV  crypfosyslem.  Namely,  fo 
swifch  a  cipherfexf  c  (in  coefficienf  represenfafion)  from  qi  fo  qj+i,  we  jusf  scale  fhe  coefficienl  vecfors  in  c  by  a 
qi+i/qi  factor,  and  fhen  round  fhe  resulf  to  gel  an  integer  polynomial  veclor  c'  such  fhaf  c'  =  c  (mod  p). 

Definitions  {Sc3\e).  For  a  vector  c  over  X[X]/Prn{X)  and  integers  qi  >  qj+i  >  p,definec'  ^  Scale(c, 

to  be  the  vector  over 'L[X]/Prn{X)  closest  to  {p/q)-c  (in  coefficient  representation)  that  satisfies  c'  =  c  (mod  p). 


Our  analysis,  however,  is  a  little  different  than  in  [3].  The  proof  from  [3,  Eemma  4]  relies  on  the  fact  that  the 
coefficient  vector  of  [(c,  s)]g;  has  low  norm,  whereas  in  out  case  we  instead  have  that  this  polynomials  has  low 
canonical  embedding  norm  mod  q*.  We  therefore  re-prove  this  lemma  under  our  new  condition. 


Lemma  13.  Let  qi  >  >  p  be  positive  integers  satisfying  qi  =  qj+i  =  1  (mod  p).  Let  c,  s  be  two  n-vectors 

over  'L[X]/Prn{X)  such  that  \  (c,  s)  I™”  <  qi/2  —  •  pn  ■  <j){m)  ■  ||s||'^“"',  and  let  c'  =  Scale(c,  qj+i,p). 

Denoting  e  =  (c,  s)  mod  Pm{X)  and  F  =  (c',  s)  mod  Pm{X)  (arithmetic  in  7j[X]/Prn{X)),  it  holds  that 

can  can 

=  [e]g,  (mod  p)  (in  coefficient  representation),  and 

\e%Z  <  — -leir  + 

Qi 

can 

Proof  Eor  some  k  G  X[X]/Prn{X),  we  have  [e]g.=  (c,  s)  —  qik,  where  the  equality  is  over  Z[X]/^m(X).  Eor 
the  same  k,  let  e"  =  e'  —  qi+ik  G  'Z[X]/Prn{X).  Since  c'  =  c  (mod  p)  and  qi  =  qj+i  (mod  p),  then  also 


UtXll 

e"  =  {c',s)  -  qi+ik  =  {c,s)  -  q^k  =[e]  (mod  Prn{X),p). 


It  therefore  suffices  to  prove  that  e”  =[e']g,^^  (equality  over  Z[X]/<?m(-^))  and  that  it  has  small  enough  norm. 


Denote  the  distance  between  ■  c  and  its  rounded  version  c'  by  (5  c'  —  Then  (5  is  a  vector  over 

Qi  Qi 

Q[X]/Prn(X),  and  the  coefficient-vectors  in  5  all  have  entries  in  [—p/2, p/2).  Moreover,  we  have 


e"  =  (c',s)-qi+ik  = 


Qi+i 

Qi 

Qi+i 

Qi 


(c,s)  -f  (5,8)  -  qi+ik 


{{c,s) -qik)  +  (5,  s)  =  ^-[e]  +(5,s). 


Qi 


(5) 
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Considering  the  polynomial  {5,  s)  G  Q[X]/(Prn{X),  we  ean  bound  its  eanonieal  embedding  norm  by: 


II  (<5,s)  11“^^  <  n 

From  Equation  (5)  we  now  get: 


<  n  ■  (j){m)  ■  ||(5|| 


<  pn  ■  4’{m)  •  ||s| 


<  ^-leir  +  IK^.s) 


Qi 


< 


Qi 


pn  ■  4>{m) 


< 


—  pn  ■  (j){m)  ■  llsll^*^^  +  pn  ■  (j){m) 


Qi+i 


Finally,  Lemma  11  implies  that  e"  =[e']q.^^,  eompleting  the  proof. 


(6) 


□ 


It  follows  immediately  from  Lemma  13  that  if  c  satisfies  our  invariant  with  respeet  to  s  and  qi,  and  if  the 
eanonieal  embedding  norm  of  s  is  small  enough  so  that  we  have  |  (c,  s)  |“"'  <  qi/2  —  ■  pn  ■  (j){m)  •  ||s||“”, 

then  the  sealed  veetor  c'  =  Sca\e{c,qi,qi+i,p)  satisfies  our  invariant  with  respeet  to  the  same  s  and  the  new 
modulus  qi+i- 


Variants.  We  note  that  one  ean  optimize  BGV  key  generation  and  eneryption  using  a  eute  triek  by  Brakerski  and 
Vaikuntanathan  [5]  (following  [15]).  This  reduees  the  publie  key  size  and  eneryption  time,  without  ehanging  the 
seheme  in  an  any  way  that  affeets  the  applieability  of  our  teehniques;  we  still  obtain  FHE  with  poly  log  overhead 
using  BGV  with  BV’s  optimizations.  (We  note  that  our  teehniques  ean  be  applied  to  the  eryptosystem  of  BV  [5] 
as  well,  but  one  needs  to  use  BGV’s  noise  management  teehnique  to  reduee  the  overhead  to  polylog.) 

In  BV  key  generation  [5],  for  level-0,  one  only  needs  to  ehoose  low-norm  elements  5o,  eo  G  Z[X]/^m(-V) 

(with  eoeffieient  norm  <C  qi)  as  well  as  a  random  element  ao  Zqp[X]/^m(-V),  andeomputing /3o  < - aoSo  + 

p  •  eo  mod  (t^m(X),  qo).  The  level-0  seeret  key  is  sq  =  [l,So],  and  the  eorresponding  publie  eneryption  key  is 
b  =  [Pq,  ao].  This  approaeh  reduees  level-0  key  size  by  faetor  of  O (log  go)-  One  generates  keys  for  the  other 
levels  similarly. 

In  BV  eneryption,  an  aggregate  plaintext  a  G  7jp[X] / <Prn{X)  is  enerypted  by  ehoosing  three  random  short 
elements  r,  ei,  €2  G  Zg(,[X]/c^m(-V)  and  setting 

c  =  [co,ci]  ^  [r/3o,rao]  +  p-  [ei,e2]  +  [a,0]  mod  {<Pm{X),qo).  (7) 

It  is  easy  to  show  that  semantie  seeurity  reduees  to  the  hardness  of  the  deeision  ring-LWE  problem  for  the  ring 
'Ijq[X]/<Prn{X)  and  the  distributions  used  to  sample  Sq,  r,  and  e,  ei,  €2. 

To  see  that  our  invariant  holds  with  respeet  to  the  level-0  seeret  key  sq  and  freshly  enerypted  eiphertexts,  note 
that  Equation  (7)  implies  that  c  =  [r/7o,  rao]  +  p  ■  [ei,  €2]  -|-  [a,  0]  (mod  <Prn{X),qo),  and  therefore 

{c,so)  =  tPo  +  pei  +  a  +  5{Tao  +  pe2)  =  -  rsao  +  preo  +  pei  +  a  +  5{Tao  +  pe2) 

=  p  •  (reo  +  ei -hse2)  +  a  (mod  g'o) 

and  the  polynomial  u  =  (reo  +  ei  +  se2)  mod  (X”^  —  1,  go)  has  low  eoeffieient  norm,  and  therefore  also  low 
eanonieal  embedding  norm.  When  using  BV  eneryption  and  key  generation,  the  other  aspeets  of  the  seheme  remain 
the  same. 
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E  A  Delayed- Reduction  Technique 


We  describe  here  another  variant,  where  we  work  with  polynomials  modulo  —  1  rather  than  polynomials 
modulo  and  reduce  back  mod  only  upon  decryption.  Importantly,  we  still  want  to  base  our  security  on 
the  hardness  of  ring-LWE  with  respect  to  the  ring  Zg[A]/^m(A)  (recall  that  decision  ring-LWE  is  easy  modulo 
X'^  —  1,  since  it  can  be  reduced  to  the  one-dimensional  problem  modulo  X  —  1). 

We  can  use  Eemma  12  to  “lift”  the  moA-<Pm{x)  polynomials  in  the  cryptosystem  into  mod- (A™  —  1)  poly¬ 
nomials,  simply  by  multiplying  by  the  polynomial  G{X)  from  that  lemma.  (This  has  the  effect  of  introducing  an 
extra  multiplicative  factor  of  m,  which  we  can  correct  upon  decryption.)  Note  that  since  G  =  0  (mod  ), 

then  we  can  write  G{X)  =  •  /i(A)  (equality  over  Z[A])  for  some  integer  polynomial  It  follows  that 

if  we  have  two  polynomials  satisfying  u  =  v  (mod  ^m)  then  Gu  =  Gv  (mod  X^  —  1)-  This  is  because  over 
Z[X]/(A"^  —  1)  we  have  u  =  v  +  for  some  integer  polynomial  r,  and  so 

X^  —  1 

Gu  =  G{v  +  T<Prn)  =  Gu  +  { - — - ^)  ■  T^ra  =  Gu  +  {X"^  —  1)  ■  =  Gu  (mod  A™  —  1) 

In  our  variant  of  the  BGV  cryptosystem,  ciphertexts  are  vectors  over  the  ring  Z[A]/(X”^  ~  1)>  secret  keys 
are  vectors  over  the  sub-ring  Z[X]/^m,  and  aggregate  plaintexts  are  elements  in  Zp[X]/^m-  We  maintain  the 
invariant  that  if  c  is  a  ciphertext  encrypting  the  aggregate  plaintext  a  relative  to  secret  key  s  and  modulus  q,  then 
in  the  ring  Zg[A]/(A™  —  1)  we  have  the  equality 

G-(c,s)  =  p-G-u  +  G -a  {mod  X^  —  1 ,  q) ,  (8) 

where  u  G  Z[X]/(A™  —  1)  has  coefficient  vector  with  small  (2-norm,  ||ii||2  <C  q.  Note  that  we  can  use  s  to 
decrypt  c  by  setting  6  <—  G  •  (c,  s)  mod  (A™  —  1,  q),  then  recovering  a  =  m~^  ■  b  mod  {<Prn,p)-  Since  both  b  and 
p  ■  Gu  +  Ga  (mod  X^  —  1)  have  coefficients  smaller  than  q/2m  absolute  value,  then  we  have  the  equality  b  = 
p  ■  Gu  +  Ga  holding  over  Z[A]/(X”^  ~  1)?  without  reduction  modulo  q.  We  thus  have  b  =  Ga  (mod  X^  —  1,  p), 
so  also  b  =  Ga  =  m  ■  a  (mod  <Prn,p),  so  indeed  a  =  b  ■  m~^  (mod  ^rn,p)- 

Having  described  decryption,  we  now  proceed  to  describe  all  the  other  elements  of  our  cryptosystem,  namely 
key-generation,  encryption,  addition,  “raw  multiplication”,  key-switching,  modulus  switching,  and  Galois  group 
actions.  All  these  components  (bar  the  last)  are  very  similar  to  their  counterpart  in  the  BGV  cryptosystem  [3], 
except  that  we  use  some  mix  of  mod-^m  and  mod-(X'”— 1)  arithmetic,  using  multiplication-by-G  and  Equation  (8) 
to  move  between  them. 


E.l  Key  generation 

The  parameters  of  the  scheme  include  the  integer  m  (that  defines  fhe  polynomials  (Pm  and  X"^  —  1),  fhe  infeger  p 
(fhaf  defines  fhe  aggregafe  plainfexf  space  'Zp[X]/<Prn),  and  fhe  sequence  of  moduli  qo  >  qi  >  ■  ■  ■  >  qi- 

Key  generafion  is  as  in  fhe  ring-EWE-based  version  BGV  [3]  over  fhe  ring  Z[x]/t^m-  Thai  is,  for  appropriale 
N  =  polylog((7o,  m),  one  chooses  low-norm  elemenfs  Sq,  eo,i,  •  • .  eo,7v  £  '^[X]/<Prn  (wifh  I2  norm  <C  qo)  as  well  as 
a  random  elemenfs  ao,i,  •  •  • ,  ao.Ar  '^qo[X\/<Pm,  and  computes  /3o,i  ^  ao,iSo+p-eo,i  mod  {<Pm,  qo)-  The  level- 
0  secrel  key  is  sq  =  [l,So],  and  fhe  corresponding  public  encryption  key  includes  fhe  vectors  bj  =  [/3o,i,  — ao,i]- 
In  addifion  fo  fhese  keys,  fhe  key-generafion  procedure  chooses  ofher  secrel  key  veclors  for  fhe  olher  levels, 
and  generales  fhe  key-swilching  malrices  belween  Ihem,  as  described  in  Seclion  E.5  below. 
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E.2  Encryption 


Encryption  is  as  in  BGV.  An  aggregate  plaintext  a  G  Zp[X]/^m(-^)  is  encrypted  by  choosing  random  short 
elements  ti,.  .  .tn  G  Z[X]/^m  and  setting 


N 

C  =  [co,  Cl]  ^  [a,  0]  +  ^  Tj  •  bj  mod  {^rn,  qo)-  (9) 

i=l 

(Actually,  the  r^’s  can  be  chosen  as  elements  of  Z[x]/^m  with  0/1  coefficients,  versus  merely  being  short.) 

Note  that  freshly  encrypted  ciphertexts  are  vectors  over  the  sub-ring  'Z[X]/<Pjn{X),  but  later  we  allow  evalu¬ 
ated  ciphertexts  to  be  in  the  larger  ring  Z[A]/(X”^  —  1).  It  is  easy  to  show  that  semantic  security  reduces  to  the 
hardness  of  the  decision  ring-LWE  problem  for  the  ring  hq[X]/(Prn  and  the  distributions  used  to  sample  the  short 
elements. 

To  see  that  our  invariant  holds  with  respect  to  the  level-0  secret  key  sq  and  freshly  encrypted  ciphertexts,  note 
that  Equation  (9)  implies  that  G  ■  c  =  G{[a,  0]  +  A  •  bj)  (mod  X"^  —  l,qo),  and  therefore 

N 

G  ■  (c,  So)  =  G{a  -h  ^  ri(so,  bj)) 

i=l 

N 

=  G{a  +  P  •  A  •  g) 

i=l 

N 

=  Ga  + p  ■  G(^Ti  ■  ei)  (mod  -  1, go) 

i=\ 

and  the  coefficient  vector  of  the  polynomial  u  =  A  '  mod  (A™  —  1,  qo)  has  low  I2  norm. 

We  stress  that  the  low  I2  norm  of  u  depends  crucially  on  our  delayed  reduction.  Indeed,  each  of  the  polynomials 
{Ti},  {ei},  G  has  low  I2  norm,  hence  their  products  and  sums  over  Z[A]  would  still  have  low  norms.  However,  we 
do  not  know  how  to  prove  that  the  norm  remains  low  when  we  reduce  them  modulo  (Prn,  h  is  only  because  we 
reduce  modulo  X"^  —  1  that  we  can  argue  that  the  norm  remains  low. 


E.3  Addition 

Adding  two  ciphertext  vectors  that  are  defined  wifh  respecf  fo  fhe  same  secref  key  and  modulus  is  jusf  sfandard 
addifion  in  Zq[A]/(A"^  —  1).  Indeed,  if  we  have  G  ■  {c,s)  =  p  ■  Gu  -|-  Ga  and  G  ■  {c' ,  s)  =  p  ■  Gu'  -|-  Ga'  (bofh 
over  Zq[A]/(A™'  —  1))  fhen  also  G  •  (c  -|-  c',  s)  =  p  ■  G{u  +  n')  +  G{a  -|-  a'),  and  fhe  I2  norm  of  fhe  coefficienf 
vecfor  ofu  +  u'  is  sfill  small. 


E.4  “Raw  multiplication” 

As  in  the  BV/BGV  family  of  cryptosystems  [5,4, 3],  “raw  multiplication”  of  two  ciphertext  vectors  (defined  with 
respect  to  the  same  secret  key  and  modulus)  is  done  using  tensor  product.  Namely,  if  we  have  ciphertext  vector 
c  which  is  decrypted  to  a  under  s  and  q,  and  another  vector  c'  which  is  decrypted  to  a'  under  s  and  q,  then  we 
set  c  =  vector(c  (g)  c')  mod  (A"^  —  1,  q)  (where  vector(-)  opens  the  matrix  into  a  vector  using  some  appropriate 

100 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


ordering).  Denoting  s  =  veetor(s  (g)  s)  mod  {^m,  q),  we  thus  have 

G  ■  {c,s)  =  G  ■  s*(c  (g)  c')s  =  G  •  (c,  s)  •  (o',  s) 

=  {p  ■  Gu  +  Ga)  ■  (o',  s)  =  {p  ■  u  +  a)  ■  G  ■  (o',  s)  =  (p  •  rt  +  a)  •  (p  •  +  Ga) 

=  p  •  G{puu'  +  ua'  +  au')  +  Gaa'  (mod  —  l,q). 

Sinee  the  eoeffieient  veetor  of  n  =  puu'  +  ua'  +  au'  mod  (X™  —  I,  q)  still  has  small  I2  norm,  it  means  that  c 
is  a  valid  eiphertext  with  respeet  to  s  and  q,  whieh  is  deerypted  to  aa'.  Note  that  above  we  used  mod-(X™  —  1) 
arithmetie  for  the  eiphertext  and  mod-^m  arithmetie  for  the  seeret  key.  This  ehoiee  was  made  for  eonvenienee  in 
other  operations. 

E.5  Key  switching 

A  erueial  eomponent  of  the  BV/BGV  eryptosystems  is  the  ability  to  translate  a  eiphertext  with  respeet  to  one 
seeret  key  into  a  eiphertext  that  deerypts  to  the  same  thing  under  another  seeret  key.  This  is  used,  for  example,  to 
translate  the  “extended  eiphertext”  that  we  get  from  raw-multiplieation  baek  to  a  normal  eiphertext,  or  to  translate 
two  eiphertext  veetors  with  respeet  to  different  keys  into  eiphertexts  with  respeet  to  the  same  key,  so  that  they  ean 
be  added  or  raw-multiplied. 

Let  s  be  a  seeret-key  veetor  over  andeonsider  another  2-element  seeret-key  veetor  t  G  (Zq[X]/^m)^ 

whose  first  entry  is  1.  To  allow  translation  from  s-eiphertexts  to  t -eiphertexts,  we  first  eneode  s  in  a  redundant  man¬ 
ner  by  eomputing  2*s  mod  g  for  i  =  0, 1, . . . ,  /  =  [log  g]  and  eoneatenating  all  these  veetors  to  form 

s  =  Powersof2g(s)  [s  |  2s  |  4s  |  ...  |  2^s]  mod  q. 

Then  we  ehoose  a  random  low  I2  norm  veetor  v  over  Zq[X]/^m  of  the  same  dimension  as  s  (eall  this  dimension  d), 
and  a  matrix  R  G  {Xq[X]/(j)rn)‘^^'^  whieh  is  ehosen  at  random  from  the  orthogonal  spaee  to  t,  namely  ti?  =  0 
(mod  (Pm,  q)-  The  key-switehing  matrix  from  s  to  t  is  then  set  as 

W  =  W[s^t]=  +  R  mod  (^™,  q) 

Again  it  is  easy  to  show  that  if  deeision  ring-LWE  is  hard  for  the  ring  hq[X]/P>m{X)  and  the  distributions  used  to 
sample  t  and  v,  then  the  matrix  W  above  is  pseudo-random,  even  for  someone  who  knows  s. 

Given  a  eiphertext  veetor  c  (over  Zg[X]/(2f™  —  1))  that  satisfies  our  invariant  with  respeet  to  s  and  q,  we 
use  W  to  translate  it  into  another  veetor  c'  that  satisfies  our  invariant  with  respeet  to  t  and  q,  as  follows:  First,  for 
i  =  0,1, ...  ,l  =  [log g]  we  denote  by  c,  the  veetor  over  Z2[2f]/(2f™  —  1)  eontaining  the  f’th  bits  from  all  the 
eoeffieients  of  all  the  entries  of  c.  Namely: 

cq  =  c  mod  2,  and  Cj  =  2“*  •  ((c  mod  2*"'“^)  —  for  i  >  0. 

j<i 

Then  the  bit-deeomposition  of  c  is  the  eoneatenation  of  all  these  veetors, 

c  =  BitDecomp(c)  [cq  |  ci  |  ...  |  c/]. 

Clearly  c  has  low  I2  norm,  sinee  it  is  represented  by  a  0-1  veetor,  and  we  have  (c,  s)  =  (c,  s)  over  Zq[X]  (and 
therefore  also  over  Zq[X]/(X"^  —  1)).  Switehing  keys  from  s  to  t  is  done  simply  by  setting  c'  <—  VLc  mod  (X”^  — 
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1,  g).  To  see  that  this  maintains  our  invariant,  assume  that  for  some  a  G  Zp[X]/^m  we  have  G- (c,  s)  =  p-Gu+Ga 
(mod  —  1,  g),  where  the  eoeffieient  veetor  of  u  has  low  I2  norm.  Then: 


G-(c',t)  = 
(c) 


(a) 


s  +  pw 
-  0  - 


G-tWc  G-t 

G  •  (c,  s)  +  p  •  G  •  (c,  v)  = 


c  =  G  •  (c, s) +p  •  G  •  (c,  v) 
p  •  G  (ri  +  (c,  v)) +Ga  (mod  X™' 

' - V - " 

u' 


1,9), 


where  Equality  (a)  follows  sinee  ti?  =  0  (mod  9)  and  therefore  G  •  tR  =  0  (mod  X™  —  1,  q).  Equality  (6) 
holds  sinee  the  first  entry  of  t  is  1,  and  Equality  (c)  follows  from  (c,  s)  =  (c,  s).  Einally,  sinee  both  v  and  c  have 
low  I2  norm,  then  over  Zq[X]/(X™  —  1)  so  has  (c,  v)  and  therefore  also  u'  =  (c,  v)  +  w  mod  (X™  —  l,q). 


E.6  Modulus  switching 

The  modulus-switehing  proeedure  is  exaetly  as  in  the  BGV  eryptosystem.  Note  that  this  proeedure  does  not  involve 
any  mod-^m  or  mod-(X™  —  1)  arithmetie:  All  we  do  is  take  a  eiphertext  veetor  c  over  Z^^  [X]/(X™  —  1),  seale  it 
down  by  a  faetor  qi+ijqi  and  round  to  get  c'  =  roundc(^^  •  c)  sueh  that  c'  =  c  (mod  p).  The  reason  that  this 
works  in  our  ease  is  exaetly  as  in  BGV,  our  delayed  reduetion  has  no  effeet  here. 

E.7  Galois  group  actions 

As  described  in  Section  4.2,  applying  the  action  X  ^  X*  on  a  ciphertext  vector  c  over  Zg[X]/(X™  —  1)  requires 
only  a  permutation  of  the  coefficients  in  each  of  the  elements  of  c  (all  which  are  degree- (m  —  1)  polynomials 
over  Zg). 

Assume  that  we  have  G  •  (c,  s)  =  p  ■  Gu  +  Ga  (mod  X"^  —  1,  q),  and  define  as  whaf  you  gef  by 

applying  fhe  fransformafion  X  ^  X®  fo  c,  u,  respecfively,  over  Zg[X]/(X”®  —  1),  and  as  whaf  you  gef 

by  applying  fhe  fransformafion  X  ^  X®  fo  s,  a,  respecfively  over  Zg[X]/c^m-  Below  we  prove  fhaf  if  f,  m  are 
co-prime  and  also  q,  m  are  co-prime,  fhen  we  have  G  •  (c^®) ,  s^®) )  =  p  ■  Gu^'^'>  +  Ga^'^'>  (mod  X®®®  —  l,q). 

Using  G  =  m  (mod  and  reducing  modulo  the  equalify  G- (c,  s)  =  p-Gtt-|-Ga,  we  have  m- (c,  s)  = 
pm  •  u  +  ma  (mod  <Prn,  q)-  Since  m,  q  are  co-prime  fhen  mulfiplying  by  m~^  (mod  q)  we  gef  (c,  s)  =  p  -  u  +  a 
(mod  (Prn,  q)-  Namely,  for  some  polynomial  k  G  Z(j[X]  we  have 

Y^Cj{X)sj{X)  =p-u{X)+a{X)  +  k{X)^^{X)  (equalify  in  Zg  [X] ) ,  (10) 

j 

and  fherefore  also  for  every  i 

^Cg(X®)sg(X®)  =  p  ■  w(X®)  +  a(X®)  +  A:(X®)^^(X®)  (equalify  in  Zg[X]).  (11) 

j 

Equafion  (11)  follows  since  fhe  fwo  sides  of  Equafion  (10)  are  idenfical  as  formal  polynomials  over  Zg,  and 
fherefore  fhey  musf  coincide  also  as  funcfions  over  any  characferisfic-q  field.  If  follows  fhaf  fhe  funcfions  on  bofh 
sides  of  Equafion  (11)  musf  also  coincide  over  any  characferislic-(7  field,  and  fherefore  fhe  fwo  sides  musf  be 
idenfical  as  formal  polynomials  over  Zg. 

Recalling  fhaf  if  f  G  fhen  t^(X)  divides  <?(X*),  we  obfain 

^  Cg(X®)sg(X®)  =  p  •  n(X®)  +  a(X®)  =  p  •  +  a«  (mod  ^m{X),q). 

j 


102 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


Now  we  can  multiply  by  G  to  “lift”  the  equality  over  to  Xq[X]/{X^  —  1)  and  we  get 

G  ■  ,  s W ^  =  p  •  (mod  X^-l,q), 

as  needed.  Observing  that  over  Xq[X]/{X'^  —  1)  the  coefficient  vectors  of  u  and  are  just  a  permutation  of 
each  other  (and  hence  have  the  same  I2  norm)  we  deduce  that  our  invariant  in  maintained  under  the  transformation 
X  X^  whenever  i  G  and  m  G  Z*. 
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Abstract 

We  describe  a  working  implementation  of  leveled  homomorphic  encryption  (with  or  without  boot¬ 
strapping)  that  can  evaluate  the  AES-128  circuit.  This  implementation  is  built  on  top  of  the  HElib  library, 
whose  design  was  inspired  by  an  early  version  of  this  work.  Our  main  implementation  (without  boot¬ 
strapping)  takes  about  4  minutes  and  3GB  of  RAM,  running  on  a  small  laptop,  to  evaluate  an  entire 
AES-128  encryption  operation.  Using  SIMD  techniques,  we  can  process  upto  120  blocks  in  each  such 
evaluation,  yielding  an  amortized  rate  of  just  over  2  seconds  per  block. 

Eor  cases  where  further  processing  is  needed  after  the  AES  computation,  we  describe  a  different 
setting  that  uses  bootstrapping.  We  describe  an  implementation  that  lets  us  process  180  blocks  in  just 
over  18  minutes  using  3.7GB  of  RAM  on  the  same  laptop,  yielding  amortized  6  seconds/block.  We  note 
that  somewhat  better  amortized  per-block  cost  can  be  obtained  using  “byte-slicing”  (and  maybe  also 
“bit-slicing”)  implementations,  at  the  cost  of  significantly  slower  wall-clock  time  for  a  single  evaluation. 

In  this  article  we  describe  many  of  the  optimizations  that  went  into  this  implementation.  These 
include  both  AES-specific  optimizations,  as  well  as  several  “generic”  tools  for  EHE  evaluation  (which 
are  incorporated  in  the  HElib  library).  The  generic  tools  include  (among  others)  a  different  variant 
of  the  Brakerski-Vaikuntanathan  key-switching  technique  that  does  not  require  reducing  the  norm  of 
the  ciphertext  vector,  and  a  method  of  implementing  the  Brakerski-Gentry-Vaikuntanathan  modulus¬ 
switching  transformation  on  ciphertexts  in  CRT  representation. 
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1  Introduction 


In  his  breakthrough  result  [13],  Gentry  demonstrated  that  fully-homomorphie  eneryption  was  theoreti¬ 
cally  possible,  assuming  the  hardness  of  some  problems  in  integer  lattices.  Since  then,  many  different 
improvements  have  been  made,  for  example  authors  have  proposed  new  variants,  improved  efficiency, 
suggested  other  hardness  assumptions,  etc.  Some  of  these  works  were  accompanied  by  implementation 
[28,  14,  8,  29,  21,  9],  but  these  implementations  were  either  “proofs  of  concept”  that  can  compute  only 
one  basic  operation  at  a  time  (at  great  cost),  or  special-purpose  implementations  limited  to  evaluating  very 
simple  functions.  In  the  early  version  of  this  work  we  reported  on  the  first  implementation  powerful  enough 
to  support  an  “interesting  real  world  circuit,”  specifically  fhe  AES-128  encryption  operafion.  To  Ibis  end,  we 
implemenfed  a  varianf  of  fhe  leveled  FHE-wifhouf-boofsfrapping  scheme  of  Brakerski,  Genfry,  and  Vaikun- 
fanafhan  [5]  (BGV).  In  fhe  currenf  arficle  we  reporf  on  an  updafed  implemenfafion  of  fhe  same  circuif,  using 
fhe  “general  purpose”  open-source  HElib  library  [18],  whose  design  was  inspired  by  fhaf  early  version  of 
our  work.  (As  of  December  2014,  we  made  our  new  implemenfafion  available  as  pari  of  HElib.) 

Why  AES?  We  chose  lo  shool  for  an  evalualion  of  AES  since  if  seems  like  a  nalural  benchmark:  AES  is 
widely  deployed  and  used  extensively  in  securily-aware  applications  (so  if  is  “practically  relevanl”  lo  imple- 
menl  if),  and  fhe  AES  circuif  is  nonlrivial  on  one  hand,  buf  on  fhe  olher  hand  nol  aslronomical.  Moreover  fhe 
AES  circuif  has  a  regular  (and  quile  “algebraic”)  slruclure  ,  which  is  amenable  lo  parallelism  and  optimiza¬ 
tions.  Indeed,  for  Ihese  same  reasons  AES  is  often  used  as  a  benchmark  for  implemenfalions  of  prolocols  for 
secure  multi -parly  compulalion  (MFC),  for  example  [26,  10,  19,  20].  Using  fhe  same  yardstick  fo  measure 
EHE  and  MFC  prolocols  is  quile  nalural,  since  Ihese  fechniques  largel  similar  applicalion  domains  and  in 
some  cases  bolh  fechniques  can  be  used  lo  solve  fhe  same  problem. 

Beyond  being  a  nalural  benchmark,  homomorphic  evalualion  of  AES  decryplion  also  has  inleresling 
applications:  When  dala  is  encrypled  under  AES  and  we  wanl  lo  compute  on  lhal  dala,  Ihen  homomorphic 
AES  decryption  would  Iransform  Ibis  AES-encrypled  dala  into  an  EHE-encrypled  dala,  and  Ihen  we  could 
perform  whatever  compulation  we  wanted.  (Such  applications  were  alluded  to  in  [21,  29,  6]). 

Why  BGV?  Our  implemenlalion  is  based  on  Ihe  (ring-EWE-based)  BGV  cryptosystem  [5] ,  which  is  one 
of  Ihe  few  varianls  lhal  seem  Ihe  mosl  likely  to  yield  “somewhal  practical”  homomorphic  encryption.  Olher 
varianls  are  Ihe  NTRU-like  cryptosystem  of  Eopez-All  el  al.  [23],  Ihe  ring-EWE-based  scale-invarianl  cryp¬ 
tosystem  of  Brakerski  [4].  These  Ihree  varianls  offer  somewhal  differenl  implemenlalion  Iradeoffs,  bul  Ihey 
all  have  similar  performance  characteristics.  We  donT  expecl  Ihe  differences  belween  Ihese  varianls  to  be 
very  significanl,  and  moreover  mosl  of  our  optimizations  for  BGV  are  useful  also  for  Ihe  olher  Iwo  vari¬ 
anls.  (Anolher  interesting  approach  if  to  implemenl  Ihe  newer  cryptosystem  of  Genlry  el  al.  [16],  or  some 
combination  Ihereof.) 

Contributions  of  this  work.  Our  implementation  is  based  on  a  variant  of  the  BGV  scheme  [5,  7,  6]  (based 
on  ring-EWE  [24]),  using  the  techniques  of  Smart  and  Vercauteren  (SV)  [29]  and  Gentry,  Halevi  and  Smart 
(GHS)  [15],  and  we  introduce  many  new  optimizations.  Some  of  our  optimizations  are  specific  to  AES, 
these  are  described  in  Section  4.  Most  of  our  optimization,  however,  are  more  general-purpose  and  can  be 
used  for  homomorphic  evaluation  of  other  circuits,  these  are  described  in  Section  3. 

Many  of  our  general-purpose  optimizations  are  aimed  at  reducing  the  number  of  EETs  and  CRTs  that 
we  need  to  perform,  by  reducing  the  number  of  times  that  we  need  to  convert  polynomials  between  coef¬ 
ficient  and  evaluation  representations.  Since  the  cryptosystem  is  defined  over  a  polynomial  ring,  many  of 


107 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


the  operations  involve  various  manipulation  of  integer  polynomials,  sueh  as  modular  multiplieations  and 
additions  and  Frobenius  maps.  Most  of  these  operations  ean  be  performed  more  effieiently  in  evaluation 
representation,  when  a  polynomial  is  represented  by  the  veetor  of  values  that  it  assumes  in  all  the  roots  of 
the  ring  polynomial  (for  example  polynomial  multiplieation  is  just  point-wise  multiplieation  of  the  evalu¬ 
ation  values).  On  the  other  hand  some  operations  in  BGV-type  eryptosystems  (sueh  as  key  switehing  and 
modulus  switehing)  seem  to  require  eoeffieient  representation,  where  a  polynomial  is  represented  by  listing 
all  its  eoeffieients.^  Henee  a  “naive  implementation”  of  FHE  would  need  to  eonvert  the  polynomials  baek 
and  forth  between  the  two  representations,  and  these  eonversions  turn  out  to  be  the  most  time-eonsuming 
part  of  the  exeeution.  In  our  implementation  we  keep  eiphertexts  in  evaluation  representation  at  all  times, 
eonverting  to  eoeffieient  representation  only  when  needed  for  some  operation,  and  then  eonverting  baek. 

We  deseribe  variants  of  key  switehing  and  modulus  switehing  that  ean  be  implemented  while  keeping 
almost  all  the  polynomials  in  evaluation  representation.  Our  key-switehing  variant  has  another  advantage,  in 
that  it  signifieantly  reduees  the  size  of  the  key-switehing  matriees  in  the  publie  key.  This  is  partieularly  im¬ 
portant  sinee  one  limiting  faetor  for  evaluating  “interesting”  eireuits  is  the  ability  to  keep  the  key-switehing 
matriees  in  memory.  Other  optimizations  that  we  present  are  meant  to  reduee  the  number  of  modulus 
switehing  and  key  switehing  operations  that  we  need  to  do. 

Our  Implementation  and  tests.  Many  of  the  optimizations  deseribed  in  this  work  were  ineorporated  in 
the  HElib  C-i-i-  library,  whieh  is  built  on  top  of  NTE  (and  GnuMP).  We  tested  our  implementation  on  a  two 
years  old  Eenovo  X230  laptop  with  Intel  Core  i5-3320M  running  at  2.6GHz,  on  whieh  we  run  an  Ubuntu 
14.04  VM  with  4GB  of  RAM  and  with  the  g-|--|-  eompiler  version  4.9.2.  The  detailed  results  of  our  tests 
are  deseribed  in  Seetion  4.4,  the  one-line  summary  is  that  we  ean  evaluate  AES-128  homomorphieally  on 
120  bloeks  in  245  seeonds  on  that  eommodity  laptop.  Also,  if  we  need  to  ineorporate  extra  proeessing  then 
we  ean  use  bootstrapping  and  get  evaluation  on  180  bloeks  in  under  18  minutes.  All  of  our  programs  are 
single-threaded,  so  only  one  eore  was  used  in  the  eomputations. 

We  note  that  there  are  a  multitude  of  optimizations  that  one  ean  perform  on  our  basie  implementation. 
Most  importantly,  there  are  great  gains  to  be  had  by  making  better  use  of  parallelism:  Unfortunately,  the 
HElib  library  is  not  yet  thread  safe,  whieh  severely  limits  our  ability  to  utilize  the  multi-eore  funetionality 
of  modern  proeessors.  Mueh  of  the  work  in  homomorphie-AES  is  “embarrassingly  parallelizable”  and  so 
we  expeet  a  fully  parallel  implementation  to  have  a  speedup  faetor  roughly  equal  to  the  number  of  aetive 
eores  (with  parallelization  opportunities  not  running  our  until  perhaps  lOOx  of  eurrent  implementation).  The 
byte-slieed  and  bit-slieed  implementations  (whieh  we  did  not  implement  on  top  of  HElib)  obviously  offer 
even  more  room  for  parallelism. 

Organization.  In  Seetion  2  we  review  the  main  features  of  BGV-type  eryptosystems  [6,  5],  and  briefly 
survey  the  teehniques  for  homomorphie  eomputation  on  paeked  eiphertexts  from  SV  and  GHS  [29,  15]. 
Then  in  Seetion  3  we  deseribe  our  “general-purpose”  optimizations  on  a  high  level,  with  additional  details 
provided  in  Appendiees  A  and  B.  A  brief  overview  of  AES  and  a  high-level  deseription  and  performanee 
numbers  is  provided  in  Seetion  4. 

*The  need  for  coefficient  representation  ultimately  stems  from  the  fact  that  the  noise  in  the  ciphertexts  is  small  in  coefficient 
representation  but  not  in  evaluation  representation. 
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2  Background 

2.1  Notations  and  Mathematical  Background 

For  an  integer  q  we  identify  the  ring  'L/qL  with  the  interval  (— g/2,(7/2]  nZ,  and  use  [z\q  to  denote  the 
reduction  of  the  integer  2:  modulo  q  into  that  interval.  Our  implementation  utilizes  polynomial  rings  defined 
by  cyclotomic  polynomials,  A  =  Z[X]/<hm(-^)-  The  ring  A  is  the  ring  of  integers  of  a  the  mth  cyclotomic 

def 

number  field  Q(Cm)-  We  lef  Ag  =  A/qA  =  'L[X]/{^rn{X),  q)  for  fhe  (possibly  composite)  integer  q,  and 
we  idenlify  Aq  wifh  fhe  sef  of  infeger  polynomials  of  degree  upfo  (/)(m)  —  1  reduced  modulo  q. 

Coefficient  vs.  Evaluation  Representation.  Lef  m,  g  be  fwo  infegers  such  fhaf  Z/gZ  confains  a  primifive 
m-fh  roof  of  unify,  and  denote  one  such  primifive  m-fh  roof  of  unify  by  G  Z/gZ.  Recall  fhaf  fhe  m’lh 
cyclofomic  polynomial  splifs  info  linear  terms  modulo  q,  ^m{X)  =  Oielz/mZ)*  ~  C)  (mod  q). 

We  consider  fwo  ways  of  represenfing  an  elemenf  a  G  Agi  Viewing  a  as  a  degree-((/>(m)  —  1)  polyno¬ 
mial,  a{X)  =  the  coefficient  representation  of  a  jusf  lisfs  all  fhe  coefficienfs  in  order  a  = 

(oo,  oi, . . . ,  G  (Z/g'Z)'^^"^^.  For  fhe  ofher  represenfafion  we  consider  fhe  values  fhaf  fhe  polyno¬ 

mial  a{X)  assumes  on  all  primifive  m-fh  roofs  of  unify  modulo  q,  bi  =  a(C*)  mod  (7  for  f  G  ifL j mlf)* .  The 
6i’s  in  order  also  yield  a  vector  b  G  {"L j q'L)‘^^'^\  which  we  call  fhe  evaluation  representation  of  a.  Clearly 
fhese  fwo  represenfafions  are  relafed  via  b  =  Vm  ■  a,  where  Vm  is  the  Vandermonde  matrix  over  the  primitive 
m-th  roots  of  unity  modulo  q.  We  remark  that  for  all  i  we  have  the  equality  (a  mod  (X  —  C))  =  a(0  = 
hence  the  evaluation  representation  of  a  is  just  a  polynomial  Chinese-Remaindering  representation. 

In  both  representations,  an  element  a  G  Ag  is  represented  by  a  (/)(m)-vector  of  integers  in  Z/qZ.  If  q  is 
a  composite  then  each  of  these  integers  can  itself  be  represented  either  using  the  standard  binary  encoding 
of  integers  or  using  Chinese-Remaindering  relative  to  the  factors  of  q.  We  usually  use  the  standard  binary 
encoding  for  the  coefficient  representation  and  Chinese-Remaindering  for  the  evaluation  representation. 
(Hence  the  latter  representation  is  really  a  double  CRT  representation,  relative  to  both  the  polynomial  factors 
of  and  the  integer  factors  of  q.) 

2.2  BGV-type  Cryptosystems 

Our  implementation  uses  a  variant  of  the  BGV  cryptosystem  due  to  Gentry,  Halevi  and  Smart,  specifically 
fhe  one  described  in  [15,  Appendix  D]  (in  fhe  full  version).  In  Ibis  crypfosyslem  bofh  cipherfexls  and  secref 
keys  are  vecfors  over  fhe  polynomial  ring  A,  and  fhe  nafive  plainfexf  space  is  fhe  space  of  binary  polynomials 
A2.  (More  generally  if  could  be  Ap  for  some  fixed  p  >2,  buf  in  our  case  we  will  always  use  A2.) 

Af  any  poinf  during  fhe  homomorphic  evaluafion  fhere  is  some  “currenf  infeger  modulus  q”  and  “currenf 
secref  key  s”,  fhaf  change  from  fime  fo  fime.  A  cipherfexf  c  is  decrypfed  using  fhe  currenf  secref  key  s 
by  faking  inner  producf  over  Ag  (wifh  q  fhe  currenf  modulus)  and  fhen  reducing  fhe  resulf  modulo  2  in 
coefficient  representation.  Namely,  fhe  decrypfion  formula  is 

a  ^  [  [(c,  s)  mod  <^m{X)]q  ]2  .  (1) 

' - V - ' 

noise 

The  polynomial  [(c,s)  mod  ^>m(A)]g  is  called  fhe  “noise”  in  fhe  cipherfexf  c.  Informally,  c  is  a  valid 
ciphertext  wifh  respecf  fo  secref  key  s  and  modulus  q  if  fhis  noise  has  “sufficienlly  small  norm”  relafive 
fo  q.  The  meaning  of  “sufficienlly  small  norm”  is  whafever  is  needed  fo  ensure  fhaf  fhe  noise  does  nol  wrap 
around  q  when  performing  homomorphic  operafions,  in  our  implemenfafion  we  keep  fhe  norm  of  fhe  noise 
always  below  some  pre-sef  bound  (which  is  defermined  in  Appendix  C.2). 
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Following  [24,  15],  the  specific  norm  that  we  use  to  evaluate  the  magnitude  of  the  noise  is  the  “canonical 
embedding  norm  reduced  mod  q”,  specifically  we  use  the  conventions  as  described  in  [15,  Appendix  D]  (in 
the  full  version).  This  is  useful  to  get  smaller  parameters,  but  for  the  purpose  of  presentation  the  reader  can 
think  of  the  norm  as  the  Euclidean  norm  of  the  noise  in  coefficient  representation.  More  details  are  given  in 
the  Appendices.  We  refer  to  the  norm  of  the  noise  as  the  noise  magnitude. 

The  central  feature  of  BGV-type  cryptosystems  is  that  the  current  secret  key  and  modulus  evolve  as 
we  apply  operations  to  ciphertexts.  We  apply  five  different  operations  to  ciphertexts  during  homomorphic 
evaluation.  Three  of  them  —  addition,  multiplication,  and  automorphism  —  are  “semantic  operations”  that 
we  use  to  evolve  the  plaintext  data  which  is  encrypted  under  those  ciphertexts.  The  other  two  operations 
—  key-switching  and  modulus-switching  —  are  used  for  “maintenance”:  These  operations  do  not  change 
the  plaintext  at  all,  they  only  change  the  current  key  or  modulus  (respectively),  and  they  are  mainly  used 
to  control  the  complexity  of  the  evaluation.  Below  we  briefly  describe  each  of  these  five  operations  on  a 
high  level.  For  the  sake  of  self-containment,  we  also  describe  key  generation  and  encryption  in  Appendix  B. 
More  detailed  description  can  be  found  in  [15,  Appendix  D]. 

Addition.  Homomorphic  addition  of  two  ciphertext  vectors  with  respect  to  the  same  secret  key  and  mod¬ 
ulus  q  is  done  just  by  adding  the  vectors  over  Kq.  If  the  two  arguments  were  encrypting  the  plaintext 
polynomials  ai,  02  G  A2  then  the  sum  will  be  an  encryption  of  01-1-02  G  A2.  This  operation  has  no  effect 
on  the  current  modulus  or  key,  and  the  norm  of  the  noise  is  at  most  the  sum  of  norms  from  the  noise  in  the 
two  arguments. 

Multiplication.  Homomorphic  multiplication  is  done  via  tensor  product  over  Ag.  In  principle,  if  the  two 
arguments  have  dimension  n  over  Ag  then  the  product  ciphertext  has  dimension  v?,  each  entry  in  the  output 
computed  as  the  product  of  one  entry  from  the  first  argument  and  one  entry  from  the  second.^ 

This  operation  does  not  change  the  current  modulus,  but  it  changes  the  current  key:  If  the  two  input 
ciphertexts  are  valid  with  respect  to  the  dimension-n  secret  key  vector  s,  encrypting  the  plaintext  polynomi¬ 
als  oi,  02  G  A2,  then  the  output  is  valid  with  respect  to  the  dimension-n^  secret  key  s'  which  is  the  tensor 
product  of  s  with  itself,  and  it  encrypts  the  polynomial  01-02  G  A2.  The  norm  of  the  noise  in  the  product 
ciphertext  can  be  bounded  in  terms  of  the  product  of  norms  of  the  noise  in  the  two  arguments.  For  our  choice 
of  norm  function,  the  norm  of  the  product  is  no  larger  than  the  product  of  the  norms  of  the  two  arguments. 

Key  Switching.  The  public  key  of  BGV-type  cryptosystems  includes  additional  components  to  enable 
converting  a  valid  ciphertext  with  respect  to  one  key  into  a  valid  ciphertext  encrypting  the  same  plaintext 
with  respect  to  another  key.  For  example,  this  is  used  to  convert  the  product  ciphertext  which  is  valid  with 
respect  to  a  high-dimension  key  back  to  a  ciphertext  with  respect  to  the  original  low-dimension  key. 

To  allow  conversion  from  dimension-n'  key  s'  to  dimension-n  key  s  (both  with  respect  to  the  same 
modulus  q),  we  include  in  the  public  key  a  matrix  W  =  W[s'  s]  over  Ag,  where  the  i’th  column  of  W  is 
roughly  an  encryption  of  the  i’th  entry  of  s'  with  respect  to  s  (and  the  current  modulus).  Then  given  a  valid 
ciphertext  c'  with  respect  to  s',  we  roughly  compute  c  =  VF  •  c'  to  get  a  valid  ciphertext  with  respect  to  s. 

In  some  more  detail,  the  BGV  key  switching  transformation  first  ensures  that  the  norm  of  the  ciphertext 
c'  itself  is  sufficiently  low  with  respect  to  q.  In  [5]  this  was  done  by  working  with  the  binary  encoding  of 
c',  and  one  of  our  main  optimization  in  this  work  is  a  different  method  for  achieving  the  same  goal  (cf. 
Section  3.1).  Then,  if  the  i’th  entry  in  s'  is  5'  G  A  (with  norm  smaller  than  q),  then  the  i’th  column  of 
VF[s'  ^  s]  is  an  n-vector  Wj  such  that  [(wj,s)  mod  <hm(V)]g  =  2ei  +  s'  for  a  low-norm  polynomial 

^It  was  shown  in  [7]  that  over  polynomial  rings  this  operation  can  be  implemented  while  increasing  the  dimension  only  to  2n  —  1 
rather  than  to  . 
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e*  G  A.  Denoting  e  =  (ei, . . . ,  e„/),  this  means  that  we  have  sW  =  s'  +  2e  over  Ag.  For  any  eiphertext 
veetor  c',  setting  c  =  PF  •  c'  G  Ag  we  get  the  equation 

[(c,s)  mod  <^rn{X)]g  =  [sWc'  mod  <^rn{X)]q  =  [(c',s')  +2  (c',e)  mod  <^rn{X)]q 

Sinee  c',  e,  and  [(o',  s')  mod  ^m{X)]q  all  have  low  norm  relative  to  q,  then  the  addition  on  the  right-hand 
side  does  not  eause  a  wrap  around  q,  henee  we  get  [[(c,  s)  mod  ^m{X)]q\2  =  [[(o',  s')  mod  ^m{X)]q\2,  as 
needed.  The  key-switehing  operation  ehanges  the  eurrent  seeret  key  from  s'  to  s,  and  does  not  ehange  the 
eurrent  modulus.  The  norm  of  the  noise  is  inereased  by  at  most  an  additive  faetor  of  2||  (o',  e)  ||. 

Modulus  Switching.  The  modulus  switehing  operation  is  intended  to  reduee  the  norm  of  the  noise,  to 
eompensate  for  the  noise  inerease  that  results  from  all  the  other  operations.  To  eonvert  a  eiphertext  c  with 
respeet  to  seeret  key  s  and  modulus  q  into  a  eiphertext  c'  enerypting  the  same  thing  with  respeet  to  the  same 
seeret  key  but  modulus  q',  we  roughly  just  seale  c  by  a  faetor  q' /q  (thus  getting  a  fraetional  eiphertext), 
then  round  appropriately  to  get  baek  an  integer  eiphertext.  Speeifieally  c'  is  a  eiphertext  veetor  satisfying 
(a)  c'  =  c  (mod  2),  and  (b)  the  “rounding  error  term”  t  '=  c'  —  {q'/q)c  has  low  norm.  Converting  c 
to  c'  is  easy  in  eoeffieient  representation,  and  one  of  our  optimizations  is  a  method  for  doing  the  same  in 
evaluation  representation  (ef.  Seetion  3.2)  This  operation  leaves  the  eurrent  key  s  unehanged,  ehanges  the 
eurrent  modulus  from  q  to  q',  and  the  norm  of  the  noise  is  ehanged  as  ||n'||  <  (9V9)II^II  +  ll"^ '  '^ll-  Note  that 
if  the  key  s  has  low  norm  and  q'  is  suffieiently  smaller  than  q,  then  the  noise  magnitude  deereases  by  this 
operation. 

A  BGV-type  eryptosystem  has  a  ehain  of  moduli,  go  <  9i  •  •  •  <  Ql-i,  where  fresh  eiphertexts  are 
with  respeet  to  the  largest  modulus  qL-i-  During  homomorphie  evaluation  every  time  the  (estimated)  noise 
grows  too  large  we  apply  modulus  switehing  from  qi  to  in  order  to  deerease  it  baek.  Eventually  we  get 
eiphertexts  with  respeet  to  the  smallest  modulus  qo,  and  we  eannot  eompute  on  them  anymore  (exeept  by 
using  bootstrapping). 

Automorphisms.  In  addition  to  adding  and  multiplying  polynomials,  another  useful  operation  is  eonvert- 
ing  the  polynomial  a{X)  G  A  to  a^'^\X)  mod  ^m{X).  Denoting  by  m  the  transformation 

Ki  :  a  ^  a^'^\  it  is  a  standard  faet  that  the  set  of  transformations  {ki  :  i  G  (Z/mZ)*}  forms  a  group 
under  eomposition  (whieh  is  the  Galois  group  ^al(Q(Cm)/Q))?  and  this  group  is  isomorphie  to  (Z/mZ)*. 
In  [5,  15]  it  was  shown  that  applying  the  transformations  Ki  to  the  plaintext  polynomials  is  very  useful,  some 
more  examples  of  its  use  ean  be  found  in  our  Seetion  4. 

Denoting  by  the  veetor  obtained  by  applying  Ki  to  eaeh  entry  in  c,  s,  respeetively,  it  was  shown 

in  [5,  15]  that  if  s  is  a  valid  eiphertext  enerypting  a  with  respeet  to  key  s  and  modulus  q,  then  is  a  valid 
eiphertext  enerypting  with  respeet  to  key  and  the  same  modulus  q.  Moreover  the  norm  of  noise 
remains  the  same  under  this  operation.  We  remark  that  we  ean  apply  key-switehing  to  in  order  to  get  an 
eneryption  of  with  respeet  to  the  original  key  s. 

2.3  Computing  on  Packed  Ciphertexts 

Smart  and  Vereauteren  observed  [28,  29]  that  the  plaintext  spaee  A2  ean  be  viewed  as  a  veetor  of  “plaintext 
slots”,  by  an  applieation  the  polynomial  Chinese  Remainder  Theorem.  Speeifieally,  if  the  ring  polynomial 
<Fm(A)  faetors  modulo  2  into  a  produet  of  irredueible  faetors  ^rn{X)  =  0^=0  2),  then  a 

plaintext  polynomial  a{X)  G  A2  ean  be  viewed  as  eneoding  £  different  small  polynomials,  aj  =  a  mod  Fj. 
Just  like  for  integer  Chinese  Remaindering,  addition  and  multiplieation  in  A2  eorrespond  to  element-wise 
addition  and  multiplieation  of  the  veetors  of  slots. 
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The  effect  of  the  automorphisms  is  a  little  more  involved.  When  i  is  a  power  of  two  then  the  transforma¬ 
tions  Ki  is  just  applied  to  each  slot  separately.  When  i  is  not  a  power  of  two  the  transformation  m 

has  the  effect  of  roughly  shifting  the  values  between  the  different  slots.  For  example,  for  some  parameters 
we  could  get  a  cyclic  shift  of  the  vector  of  slots:  If  a  encodes  the  vector  (ao,  ai, . . . ,  then  Ki{a)  (for 

some  i)  could  encode  the  vector  (a^_i,  oq,  .  •  • ,  0^-2)-  This  was  used  in  [15]  to  devise  efficient  procedures 
for  applying  arbitrary  permutations  to  the  plaintext  slots. 

We  note  that  the  values  in  the  plaintext  slots  are  not  just  bits,  rather  they  are  polynomials  modulo  the 
irreducible  Fj’s,  so  they  can  be  used  to  represents  elements  in  extension  fields  GF(2‘^).  In  parficular,  in  our 
AES  implemenfafions  we  used  fhe  plainfexf  slofs  fo  hold  elemenfs  of  GF(2®),  and  encrypf  one  byte  of  fhe 
AES  slate  in  each  slol.  Then  we  can  use  an  adaplion  of  fhe  lechniques  from  [15]  lo  permute  fhe  slofs  when 
performing  fhe  AES  row-shifl  and  column-mix. 

3  General-Purpose  Optimizations 

Below  we  summarize  our  opfimizafions  fhaf  are  nol  lied  direclly  lo  fhe  AES  circuil  and  can  be  used  also  in 
homomorphic  evaluation  of  olher  circuils.  Underlying  many  of  Ihese  opfimizafions  is  our  choice  of  keeping 
cipherlexl  and  key-swilching  malrices  in  evaluation  (double-CRT)  represenlalion.  Roughly  speaking,  our 
chain  of  moduli  is  defined  via  a  sel  of  same-size  primes,  Po,Pi,P2,  ■  ■  ■,  chosen  such  lhal  Z/pjZ  has  m’lh 
roofs  of  unify.  (In  olher  words,  m\pi  —  1  for  all  i.)  Eor  i  =  0, . . . ,  L  —  1  we  Ihen  define  our  f’lh  modulus 
as  Qi  =  n}=o  Pi-  To  gain  efficiency,  we  aclually  choose  po  lo  be  half  fhe  bil-size  of  fhe  olher  pi’s,  and  so 

fhe  odd  indexed  moduli  in  fhe  chain  are  a  producl  of  fhe  primes  slarfing  al  pQ  (qi  =  nl=o  ^  Pi)  fhe  even- 
indexed  moduli  are  producls  lhal  do  nol  include  po  (qi  =  implemenlalion  fhe  half-sized 

prime  has  23-25  bils  (and  fhe  full-sized  primes  Iherefore  have  46-50  bifs).  Eor  easy  of  exposition,  however, 
in  fhe  resl  of  Ihis  reporl  we  ignore  Ihis  “half-sized”  prime  and  describe  all  our  opfimizafions  as  if  we  were 
using  only  a  chain  of  same-size  primes. 

In  fhe  f-lh  level  of  fhe  scheme  we  have  cipherlexls  consisling  of  elemenfs  in  (i.e.,  polynomials 
modulo  {^rn{X),  qt)).  We  represenl  an  elemenl  c  G  Ag^  by  a  (/)(m)  x  (t  -|-  1)  “malrix”  of  ils  evaluations 
al  fhe  primilive  m-lh  roofs  of  unify  modulo  fhe  primes  po, ...  ,pt.  Computing  Ihis  represenlalion  from  fhe 
coefficienl  represenlalion  of  c  involves  reducing  c  modulo  fhe  pi’s  and  Ihen  f  -|-  1  invocalions  of  fhe  EET 
algorilhm,  modulo  each  of  fhe  pi  (picking  only  fhe  EET  coefficienls  corresponding  lo  (Z/mZ)*).  To  converl 
back  lo  coefficienl  represenlalion  we  invoke  Ihe  inverse  EET  algorilhm,  each  time  padding  Ihe  (j){m) -vector 
of  evaluation  poinl  wilh  m  —  (j){m)  zeros  (for  Ihe  evaluations  al  Ihe  non-primitive  roofs  of  unify).  This 
yields  Ihe  coefficienls  of  Ihe  polynomials  modulo  (X”^  —  1,  pi)  for  i  =  0, . . . ,  1,  we  Ihen  reduce  each  of 
Ihese  polynomials  modulo  {^rn{X),pi)  and  apply  Chinese  Remainder  interpolation.  We  slress  lhal  we  fry 
lo  perform  Ihese  Iransformalions  as  rarely  as  we  can. 

3.1  A  New  Variant  of  Key  Switching 

As  described  in  Section  2,  Ihe  key-swilching  Iransformalion  inlroduces  an  additive  factor  of  2  (c',e)  in 
Ihe  noise,  where  c'  is  Ihe  inpul  cipherlexl  and  e  is  Ihe  noise  componenl  in  Ihe  key-swilching  malrix.  To 
keep  Ihe  noise  magnilude  below  Ihe  modulus  q,  il  seems  lhal  we  need  to  ensure  lhal  Ihe  cipherlexl  c' 
ilself  has  low  norm.  In  BGV  [5]  Ihis  was  done  by  representing  c'  as  a  fixed  linear  combination  of  small 
vectors,  i.e.  c'  =  2*c'  wilh  c'  Ihe  vector  of  i’lh  bils  in  c'.  Considering  Ihe  high-dimension  cipherlexl 

c*  =  (cg|cj|c2|  •  •  • )  and  secrel  key  s*  =  (s'|2s'|4s'|  •  •  • ),  we  note  lhal  we  have  (c*,  s*)  =  (o',  s'),  and  c* 
has  low  norm  (since  il  consisls  of  0-1  polynomials).  BGV  Iherefore  included  in  Ihe  public  key  Ihe  malrix 
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W  =  M^[s*  ^  s]  (rather  than  W[s'  s]),  and  had  the  key-switching  transformation  computes  c*  from  c' 
and  sets  c  =  W  •  c*. 

When  implementing  key-switching,  there  are  two  drawbacks  to  the  above  approach.  First,  this  increases 
the  dimension  (and  hence  the  size)  of  the  key  switching  matrix.  This  drawback  is  fatal  when  evaluating  deep 
circuits,  since  having  enough  memory  to  keep  the  key-switching  matrices  turns  out  to  be  a  limiting  factor  in 
our  ability  to  evaluate  such  circuits.  In  addition,  for  this  key-switching  we  must  first  convert  c'  to  coefficient 
representation  (in  order  to  compute  the  c'’s),  then  convert  each  of  the  c'’s  back  to  evaluation  representation 
before  multiplying  by  the  key-switching  matrix.  In  level  t  of  the  circuit,  this  seem  to  require  ff(tlogq4) 
FFTs. 

In  this  work  we  propose  a  different  variant:  Rather  than  manipulating  c'  to  decrease  its  norm,  we  instead 
temporarily  increase  the  modulus  q.  We  recall  that  for  a  valid  ciphertext  c',  encrypting  plaintext  a  with 
respect  to  s'  and  q,  we  have  the  equality  (o',  s')  =  2e'  -|-  a  over  Aq,  for  a  low-norm  polynomial  e'.  This 
equality,  we  note,  implies  that  for  every  odd  integer  p  we  have  the  equality  (c',ps')  =  2e"  -|-  a,  holding 
over  Apq,  for  the  “low-norm”  polynomial  e"  (namely  e"  =  p  -  e'  +  ^^a).  Clearly,  when  considered  relative 
to  secret  key  ps  and  modulus  pq,  the  noise  in  c'  is  p  times  larger  than  it  was  relative  to  s  and  q.  However, 
since  the  modulus  is  also  p  times  larger,  we  maintain  that  the  noise  has  norm  sufficiently  smaller  than  the 
modulus.  In  other  words,  c'  is  still  a  valid  ciphertext  that  encrypts  the  same  plaintext  a  with  respect  to  secret 
key  ps  and  modulus  pq.  By  taking  p  large  enough,  we  can  ensure  that  the  norm  of  c'  (which  is  independent 
of  p)  is  sufficiently  small  relative  to  the  modulus  pq. 

We  therefore  include  in  the  public  key  a  matrix  W  =  W[ps'  s]  modulo  pq  for  a  large  enough  odd 
integer  p.  (Specifically  we  need  p  k,  q^/m.)  Given  a  ciphertext  c',  valid  with  respect  to  s  and  q,  we  apply 
the  key-switching  transformation  simply  by  setting  c  =  W  -c'  over  Kpq.  The  additive  noise  term  (o',  e)  that 
we  get  is  now  small  enough  relative  to  our  large  modulus  pq,  thus  the  resulting  ciphertext  c  is  valid  with 
respect  to  s  and  pq.  We  can  now  switch  the  modulus  back  to  q  (using  our  modulus  switching  routine),  hence 
getting  a  valid  ciphertext  with  respect  to  s  and  q. 

We  note  that  even  though  we  no  longer  break  c'  into  its  binary  encoding,  it  seems  that  we  still  need  to 
recover  it  in  coefficient  representation  in  order  to  compute  the  evaluations  of  c'  mod  p.  However,  since  we 
do  not  increase  the  dimension  of  the  ciphertext  vector,  this  procedure  requires  only  0{t)  FFTs  in  level  t  (vs. 
0(f  log  qt)  =  O(f^)  for  the  original  BGV  variant).  Also,  the  size  of  the  key-switching  matrix  is  reduced  by 
roughly  the  same  factor  of  log  qt. 

Our  new  variant  comes  with  a  price  tag,  however:  We  use  key-switching  matrices  relative  to  a  larger 
modulus,  but  still  need  the  noise  term  in  this  matrix  to  be  small.  This  means  that  the  LWE  problem  under¬ 
lying  this  key-switching  matrix  has  larger  ratio  of  modulus/noise,  implying  that  we  need  a  larger  dimension 
to  get  the  same  level  of  security  than  with  the  original  BGV  variant.  In  fact,  since  our  modulus  is  more  than 
squared  (from  q  to  pq  with  p  >  q),  the  dimension  is  increased  by  more  than  a  factor  of  two.  This  translates 
to  more  than  doubling  of  the  key-switching  matrix,  partly  negating  the  size  and  running  time  advantage  that 
we  get  from  this  variant. 

Of  course,  one  can  also  use  a  hybrid  of  the  two  approaches:  we  can  decrease  the  norm  of  c'  only 
somewhat  by  breaking  it  into  a  few  digits  (as  opposed  to  binary  bits  as  in  [5]),  and  then  increase  the  modulus 
somewhat  until  it  is  large  enough  relative  to  the  smaller  norm  of  c'.  The  HElib  implementation  indeed  let 
us  break  c  to  any  number  of  digits,  upto  the  number  of  primes  in  the  chain,  and  in  our  experiments  we  used 
anywhere  between  3  and  6  digits  to  get  the  right  level  of  security  for  the  different  settings. 
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3.2  Modulus  Switching  in  Evaluation  Representation 

Given  an  element  c  G  Ag^  in  evaluation  (double-CRT)  representation  relative  to  qt  =  05=0  want  to 

modulus-switeh  to  qt-i  -  i.e.,  seale  down  by  a  faetor  of  pt,  we  eall  this  operation  Scale(c,  qt,  qt-i)-  The 
output  should  be  d  G  A,  represented  via  the  same  double-CRT  format  (with  respeet  to  po)  •  •  •  sueh 

that  (a)  d  =  c  (mod  2),  and  (b)  the  “rounding  error  term”  t  =  d  —  ic/pt)  has  a  very  low  norm.  As  pt  is 
odd,  we  ean  equivalently  require  that  the  element  pt  ■  d  satisfy 

(i)  (d  is  divisible  by  pt, 

(ii)  (d  =  c  (mod  2),  and 

(iii)  o'  —  c  (whieh  is  equal  to  pt  •  r)  has  low  norm. 

Rather  than  eomputing  d  direetly,  we  will  first  eompute  d  and  then  set  d  <—  d/pt.  Observe  that  onee  we 
eompute  d  in  double-CRT  format,  it  is  easy  to  output  also  d  in  double-CRT  format:  given  the  evaluations 
for  d  modulo  pj  {j  <  t),  simply  multiply  them  by  p^^  mod  pj.  The  algorithm  to  output  d  in  double-CRT 
format  is  as  follows: 

1.  Set  c  to  be  the  eoeffieient  representation  of  c  mod  pt-  (Computing  this  requires  a  single  “small  FFT” 
modulo  the  prime  pt.) 

2.  Add  or  subtraet  pt  from  every  odd  eoeffieient  of  c,  thus  obtaining  a  polynomial  5  with  eoeffieients  in 
{—pt,Pt\  sueh  that  6  =  c  =  c  (mod  pt)  and  (5  =  0  (mod  2). 

3.  Set  d  =  c  —  5,  and  output  it  in  double-CRT  representation. 

Sinee  we  already  have  c  in  double-CRT  representation,  we  only  need  the  double-CRT  representation 
of  5,  whieh  requires  t  more  “small  FFTs”  modulo  the  pj’s. 

As  all  the  eoeffieients  of  d  are  within  pt  of  those  of  c,  the  “rounding  error  term”  t  =  {d  —  c) /pt  has 
eoeffieients  of  magnitude  at  most  one,  henee  it  has  low  norm. 

The  proeedure  above  uses  t  +  1  small  FFTs  in  total.  This  should  be  eompared  to  the  naive  method  of 
just  eonverting  everything  to  eoeffieient  representation  modulo  the  primes  (f  -|-  1  FFTs),  CRT-interpolating 
the  eoeffieients,  dividing  and  rounding  appropriately  the  large  integers  (of  size  ss  qt),  CRT-deeomposing  the 
eoeffieients,  and  then  eonverting  baek  to  evaluation  representation  (t  +  1  more  FFTs).  The  above  approaeh 
makes  explieit  use  of  the  faet  that  we  are  working  in  a  plaintext  spaee  modulo  2;  in  Appendix  D  we  present 
a  teehnique  whieh  works  when  the  plaintext  spaee  is  defined  modulo  a  larger  modulus. 

3.3  Dynamic  Noise  Management 

As  described  in  fhe  liferafure,  BGV-fype  crypfosyslems  facifly  assume  fhaf  each  homomorphic  operation 
operafion  is  followed  a  modulus  swifch  fo  reduce  fhe  noise  magnifude.  In  our  implemenfafion,  however,  we 
attach  fo  each  cipherfexf  an  esfimafe  of  fhe  noise  magnifude  in  fhaf  cipherfexf,  and  use  fhese  esfimafes  fo 
decide  dynamically  when  a  modulus  swifch  musf  be  performed. 

Each  modulus  swifch  consumes  a  level,  and  hence  a  goal  is  fo  reduce,  over  a  compufafion,  fhe  number  of 
levels  consumed.  By  paying  parficular  affenfion  fo  fhe  parameters  of  fhe  scheme,  and  by  carefully  analyzing 
how  various  operations  affecl  fhe  noise,  we  are  able  fo  confrol  fhe  noise  much  more  carefully  fhan  in  prior 
work.  In  parficular,  we  nofe  fhaf  modulus-swifching  is  really  only  necessary  Jusf  prior  fo  mulfiplicafion 
(when  fhe  noise  magnifude  is  abouf  fo  gef  squared),  in  ofher  fimes  if  is  accepfable  fo  keep  fhe  cipherfexfs  af 
a  higher  level  (wifh  higher  noise). 
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4  Homomorphic  Evaluation  of  AES 


Next  we  deseribe  our  homomorphie  implementation  of  AES-128.  Our  main  impelemntation  is  “paeked”, 
namely  the  entire  AES  state  is  paeked  in  just  one  eiphertext.  Two  other  possible  implementations  (of  byte- 
sliee  and  bit-sliee  AES)  are  deseribed  later  in  Seetion  4.2.  We  note  that  in  our  earlier  work  we  implemented 
all  htree  versions,  but  in  the  newer  work  we  only  re-implemented  the  “paeked”  version. 

A  Brief  Overview  of  AES.  The  AES-128  eipher  eonsists  of  ten  applieations  of  the  same  keyed  round 
funetion  (with  different  round  keys).  The  round  funetion  operates  on  a  4  x  4  matrix  of  bytes,  whieh  are 
sometimes  eonsidered  as  element  of  F28.  The  basie  operations  that  are  performed  during  the  round  funetion 
are  AddKey,  SubBytes,  ShiftRows,  MixColumns.  The  AddKey  is  simply  an  XOR  operation  of  the  eurrent 
state  with  16  bytes  of  key;  the  SubBytes  operation  eonsists  of  an  inversion  in  the  field  F28  followed  by  a 
fixed  F2-affine  map  on  fhe  bifs  of  fhe  elemenf;  fhe  ShiftRows  rotates  the  entries  in  the  row  i  of  the  4x4 
matrix  by  t  —  1  plaees  to  the  left;  finally  the  MixColumns  operations  pre-multiplies  the  state  matrix  by  a 
fixed  4x4  matrix. 


Our  Packed  Representation  of  the  AES  state.  Eor  our  implementation  we  ehose  the  native  plaintext 
spaee  of  our  homomorphie  eneryption  so  as  to  support  operations  on  the  finite  field  F28.  To  this  end  we 
ehoose  our  ring  polynomial  as  that  faetors  modulo  2  into  degree-d  irredueible  polynomials  sueh 

that  8\d.  (In  other  words,  the  smallest  integer  d  sueh  that  m\  (2^^  —  1)  is  divisible  by  8.)  This  means  that  our 
plaintext  slots  ean  hold  elements  of  F2d,  and  in  partieular  we  ean  use  them  to  hold  elements  of  F28  whieh 
is  a  sub-field  of  F2d.  Sinee  we  have  £  =  (j){m)/d  plaintext  slots  in  eaeh  eiphertext,  we  ean  represent  upto 
eomplete  AES  state  matriees  per  eiphertext. 

Moreover,  we  choose  our  parameter  m  so  that  there  exists  an  element  g  G  that  has  order  16  in 
both  ZJ^  and  the  quotient  group  ZJ^/  (2).  This  condition  means  that  if  we  put  16  plaintext  bytes  in  slots 
t,  tg,  tg‘^,tg^, . . .  (for  some  t  G  ZJ^),  then  the  conjugation  operation  X  1-^  implements  a  cyclic  right 
shift  over  these  sixteen  plaintext  bytes.  Below  we  denote  the  vector  of  plaintext  slots  by  a  =  with 

each  Oj  G  F28.  We  place  the  16  bytes  of  the  AES  state  in  plaintext  slots  using  column-first  ordering,  namely 
we  have 


Cl  ~  [Q;QgQ;j^QQ;2oCK30®01  *^ll®21  ®31  *^02®12®22®32®03®13®23®33]  ) 

representing  the  input  plaintext  matrix 


/  aoo  aoi  ao2  ao3  \ 

aio  an  ai2  ais 

020  021  022  023 

V  O30  031  032  033  / 


4.1  Homomorphic  Evaluation  of  the  Basic  Operations 

We  now  examine  each  AES  operation  in  turn,  and  describe  how  it  is  implemented  homomorphically. 


4.1,1  AddKey  and  SubBytes 

The  AddKey  is  just  a  simple  addition  of  ciphertexts,  which  yields  a  4  x  4  matrix  of  bytes  in  the  input  to  the 
SubBytes  operation. 
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During  S-box  lookup,  each  plaintext  byte  Uij  should  be  replaced  by  (3ij  =  S{aij),  where  S{-)  is  a  fixed 
permutation  on  the  bytes.  Specifically,  S{x)  is  obfained  by  firsl  compufing  y  =  in  F28  (wifh  0  mapped 
fo  0),  fhen  applying  a  bifwise  affine  fransformafion  z  =  T{y)  where  elemenfs  in  F28  are  freafed  as  bif  sfrings 
wifh  represenfafion  polynomial  G{X)  =  x®  +  +  x  +  1. 

We  implemenf  F28  inversion  followed  by  fhe  F2  affine  fransformafion  using  fhe  Frobenius  automor¬ 
phisms,  X  — >  X‘^\  Recall  fhaf  fhe  fransformafion  K2j  (a(X))  =  (a(X^^)  mod  is  applied  sepa- 

rafely  fo  each  slof,  hence  we  can  use  if  fo  fransform  fhe  vecfor  info  {af  We  nofe  fhaf  applying 

fhe  Frobenius  aufomorphisms  fo  cipherfexfs  has  almost  no  influence  on  fhe  noise  magnifude,  and  hence  if 
does  nol  consume  any  levels.^ 

Inversion  over  F28  is  done  using  essentially  fhe  same  procedure  as  Algorifhm  2  from  [27]  for  computing 
P  =  a~^  =  This  procedure  lakes  only  fhree  Frobenius  aufomorphisms  and  four  mulliplicalions, 

arranged  in  a  deplh-3  circuil  (see  delails  below.)  To  apply  fhe  AES  F2  affine  fransformafion,  we  use  fhe  facl 
fhaf  any  F2  affine  fransformafion  can  be  compuled  as  a  F28  affine  fransformafion  over  fhe  conjugates.  Thus 
Ihere  are  conslanls  70, 71, . . . ,  77,  5  G  F28  such  fhaf  fhe  AES  affine  fransformafion  Taes(')  can  be  expressed 
as  Taes(P)  =  ^  +  X^J=o7i  '  ever  F28.  We  Iherefore  again  apply  fhe  Erobenius  aufomorphisms  to 
compufe  eighl  cipherlexls  encrypling  fhe  polynomials  K23  {h)  for  j  =  0, 1, . . . ,  7,  and  lake  fhe  appropriate 
linear  combination  (wifh  coefficienls  fhe  7j’s)  to  gel  an  encryption  of  fhe  vecfor  (TAEs(Q!~^))i=i.  Eor  our 
paramelers,  a  mulliplicalion-by-conslanl  operation  consumes  roughly  half  a  level  in  terms  of  added  noise. 

One  sublie  implemenlalion  delail  fo  nofe  here,  is  fhaf  allhough  our  plainlexl  slols  all  hold  elemenfs 
of  fhe  same  field  F28,  Ihey  hold  Ihese  elemenfs  wifh  respecl  fo  differenl  polynomial  encodings.  The  AES 
affine  fransformafion,  on  fhe  olher  hand,  is  defined  wifh  respecl  fo  one  parlicular  fixed  polynomial  encoding. 
This  means  fhaf  we  musl  implemenf  in  fhe  f’lh  slof  nol  fhe  affine  fransformafion  Taes(')  itself  but  rather 
the  projection  of  this  transformation  onto  the  appropriate  polynomial  encoding:  When  we  take  the  affine 
fransformafion  of  fhe  eighl  cipherlexls  encrypling  hj  =  (6),  we  Iherefore  multiply  fhe  encryption  of  bj 

nol  by  a  conslanl  fhaf  has  7^  in  all  fhe  slols,  bul  ralher  by  a  conslanl  fhaf  has  in  slof  i  fhe  projection  of  7j  fo 
fhe  polynomial  encoding  of  slof  i. 

Below  we  provide  a  pseudo-code  description  of  our  S-box  lookup  implemenlalion,  logelher  wifh  an 
approximalion  of  fhe  levels  fhaf  are  consumed  by  Ihese  operalions. 


Eevel 


Inpuf:  cipherfexl  c 

t 

//  Compufe  C254  = 

1.  C2  ^ 

c>  2 

t 

H  Frobenius  X  ^  X'^ 

2.  C3^ 

■  C  X  C2 

t-1 

H  Mulliplicalion 

3.  C12  ^ 

-  C3  >4 

t-1 

//  Frobenius  X  ^  X^ 

4.  Cl4  <- 

-  C12  X  C2 

t-2 

H  Mulliplicalion 

5.  Ci5  ^ 

-  C12  X  C3 

t-2 

//  Mulliplicalion 

6.  C240 

^  Ci5  >  16 

t-2 

//  Frobenius  X  ^  X^^ 

7.  C254 

^  C240  X  Ci4 

t-3 

H  Mulliplicalion 

//  Affine  fransformafion  over  F2 

8.  cG  <- 

-  C254  >  2-^  for  j  =  0, 1,2, . . 

.,7  t-3 

//  Frobenius  X  ^  X"^^ 

9.  c"  ^ 

■  7  +  E?=o  7i  X  c'2, 

t-3.5 

H  Einear  combinalion  over  F28 

^It  does  increase  the  noise  magnitude  somewhat,  because  we  need  to  do  key  switching  after  these  automorphisms.  But  this  is 
only  a  small  influence,  and  we  will  ignore  it  here. 
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4,1.2  ShiftRows  and  MixColumns 


As  commonly  done,  we  lump  together  the  ShiftRows/MixColumns  operations,  viewing  both  as  a  single 
linear  transformation  over  vectors  from  (F28)^®.  As  mentioned  above,  by  a  careful  choice  of  the  parameter  m 
and  the  placement  of  the  AES  state  bytes  in  our  plaintext  slots,  we  can  implement  a  rotation-by-i  of  the  rows 
of  the  AES  matrix  as  a  single  automorphism  operations  X  ^  X®*  (for  some  element  g  G  {’Zjm'Z)*).  Given 
the  ciphertext  c"  after  the  SubBytes  step,  we  use  these  operations  in  conjunction  with  ^-SEEECT  operations 
(as  described  in  [15])  to  compute  four  ciphertexts  corresponding  to  the  appropriate  permutations  of  the  16 
bytes  (in  each  of  the  £/16  different  input  blocks).  These  four  ciphertexts  are  combined  via  a  linear  operation 
(with  coefficients  1,X,  and  (1  +  X))  to  obtain  the  final  resulf  of  fhis  round  funclion. 

Moreover,  fhe  mulfiply-by-consfanl  operafions  implied  by  ^-SEEECT  can  be  folded  info  fhe  mulfiply- 
by-consfanf  operafions  of  fhe  linear  Iransformalions,  hence  fhe  entire  shifl-row/mix-column  operation  con¬ 
sumes  only  1/2  level  in  ferms  of  noise.  Einally,  if  is  possible  fo  implemenf  fhe  enfire  procedure  using  only 
six  rofafion  operations,  as  described  nexf.  Recall  our  column-byfe-ordering  of  fhe  AES  sfafe: 


a  ? 

^  1^00^10^20 

*^30® 

01  ®ii 

^21  ^31  ^02  ^12  ^22  ^32  ^03  ^13  ^23  ^33 

i  aoo 

aoi 

ao2 

ao3  \ 

A 

_ 

aio 

ail 

ai2 

ai3 

0^20 

0121 

a22 

a23 

\  ^30 

oiu 

0:32 

^33  / 

We  apply  fo  fhe  sfafe  vector  a  fhree  righf-rofafions  by  11,6,  and  1  positions  to  gel  fhe  Ihree  vectors  an,  ag,  ai 
representing  fhe  malrices  An,  Aq,  Ai,  respectively: 

On  ^  [®1iQ^21®31  •  •  •  ®30*^0l]  ^  [Q^22®32*^03  '  ‘  '  *^02*^12]  ^  [®33*^00*^10  '  '  '  ^13^23] 


/ 

an 

ai2 

ai3 

aio 

\ 

/ 

022 

023 

020 

021 

\ 

/ 

033 

030 

031 

032 

a2i 

a22 

a23 

020 

Aq  = 

032 

O33 

O30 

031 

Ai  = 

Oqo 

Ool 

O02 

O03 

a3i 

a32 

«33 

030 

O03 

oqo 

Ool 

O02 

OlO 

On 

012 

013 

V 

ao2 

«03 

«00 

Ooi 

) 

V 

013 

OlO 

On 

012 

) 

V 

020 

021 

022 

023 

Considering  fhe  top  row  in  fhe  four  malrices  (consisting  of  fhe  bytes  in  positions  0,4,8,12),  we  see  lhaf 
we  gel  exaclly  fhe  four  rows  of  fhe  malrix  afler  fhe  shifl-row  operafions.  Hence  Ihese  four  bytes  in  fhe 
four  malrices  are  exaclly  aligned  so  we  can  use  SIMD  operations  fo  compule  fhe  column-mix  operations. 
We  nexf  multiply  Ihese  malrices  by  conslanls  lhaf  have  O’s  in  all  positions  excepl  0,4,8,12,  and  in  Ihose 
selected  positions  Ihey  have  eilher  1,  X,  or  X  -|-  1.  Below  we  denote  Ihese  conslanls  by  Ci,  Cx  and  Cx+i, 
respectively.  Selling 

Bq  =  A  ■  Cx  +  (Ai  -h  Aq)  ■  Cl  -h  All  •  Cx+i,  B[  =  {A  +  Ai)  •  Ci  +  Aq-  Cx+i  +  An  •  Cx, 

^2  =  (A  -h  All)  •  Cl  -h  Ai  •  Cx+i  +  Aq  •  Cx,  B'^  =  A  -  Cx+i  -I-  Ai  •  Cx  +  (Ae  -I-  An)  •  Ci 

we  gel  lhal  Ihe  top  rows  of  Ihe  four  H'’s  conlain  Ihe  four  rows  of  Ihe  resulting  malrix  B  after  mix-column, 
and  moreover  all  Ihe  olher  rows  in  Ihe  i?'’s  are  zero.  Having  computed  all  Ihe  rows  of  Ihe  resull,  we  use 
Ihree  more  rolalions  to  move  Ihem  to  place,  namely  sel  H  =  Hg  -|-  (Hj  1)  +  {B 2  ^  2)  +  {B'^  ^3).  A 
pseudo-code  of  Ihe  combined  shifl-row/mix-column  operation  is  given  below: 
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Input:  ciphertext  c" 

10.  c"  ^  c"  »  j  for  j  =  0,1,6,11 

11.  Cq  <—  Cq  •  Cx  +  {c'l  +  Cg)Ci  +  c'(i  ■  Cx+i 
Cl  (cq  +  CiO^i  +  Cg  •  Cx+i  +  c'(i  •  Cx 
^2  ^  (Cq  +  Cii)Cl  +  c"  •  Cx+l  +  Cg  •  Cx 
C3  ^  <^0  ■  Cx+i  +  c"  •  Cx  +  (cg  +  c"i)Ci 

12.  Output  +  {cl  >  1)  +  (c^  >  2)  +  (c^  >  3) 


Level 

f-3.5 

t  —  3.5  1 1  Rotations 

t  —  A  1 1  Linear  combinations 

t  —  A  1 1  Assembling  the  result 


4.1.3  The  Cost  of  One  Round  Function 

The  above  description  yields  an  estimate  of  4  levels  for  implementing  one  round  function,  which  is  in¬ 
deed  what  we  get  in  our  experiments.  The  time  complexity  is  dominated  by  the  number  of  key-switching 
operations,  which  we  need  to  do  for  every  multiplication  and  every  automorphism.  The  byte-substitution 
takes  three  multiplications  and  four  automorphisms  for  inversion,  and  seven  more  automorphisms  for  the 
affine  transformation,  for  a  total  of  14  key-switches.  The  shift-row/mix-column  operation  adds  six  more 
automorphisms,  for  a  grand  total  of  20  key-switches  per  round. 

We  mention  that  the  byte-slice  implementation  in  Section  4.2  below  would  consume  the  same  number  of 
levels  but  use  less  key-switching  operations  per  round  since  the  shift-row/column-mix  operation  no  longer 
needs  automorphisms.  Hence  we  would  get  14  rather  than  20  key-switching  operations  per  round,  so  we 
expect  the  amortized  complexity  of  this  implementation  to  be  faster  by  a  factor  of  20/14  1.4.  However, 

since  we  need  to  manipulate  16  times  as  many  ciphertexts,  the  implementation  would  take  much  more  time 
per  evaluation  (by  a  factor  of  16  •  14/20  =  11.2)  and  require  more  memory. 

4.2  Byte-  and  Bit-Slice  Implementations 

In  the  byte  sliced  implementation  we  use  sixteen  distinct  ciphertexts  to  represent  a  single  state  matrix.  (But 
since  each  ciphertext  can  hold  ^  plaintext  slots,  then  these  16  ciphertexts  can  hold  the  state  of  ^  different 
AES  blocks).  In  this  representation  there  is  no  interaction  between  the  slots,  thus  we  operate  with  pure  £-fold 
SIMD  operations.  The  AddKey  and  SubBytes  steps  are  exactly  as  above  (except  applied  to  16  ciphertexts 
rather  than  a  single  one).  The  permutations  in  the  ShiftRows/MixColumns  step  are  now  “for  free”,  but  the 
scalar  multiplication  in  MixColumns  still  consumes  1/2  level  in  the  modulus  chain. 

For  the  bit  sliced  implementation  we  represent  the  entire  round  function  as  a  binary  circuit,  and  we  use 
128  distinct  ciphertexts  (one  per  bit  of  the  state  matrix).  However  each  set  of  128  ciphertexts  is  able  to 
represent  a  total  of  £  distinct  blocks.  The  main  issue  here  is  how  to  create  a  circuit  for  the  round  function 
which  is  as  shallow,  in  terms  of  number  of  multiplication  gates,  as  possible.  Again  the  main  issue  is  the 
SubBytes  operation  as  all  operations  are  essentially  linear.  To  implement  the  SubBytes  we  used  the  “depth- 
16”  circuit  of  Boyar  and  Peralta  [3],  which  consumes  four  levels.  The  rest  of  the  round  function  can  be 
represented  as  a  set  of  bit-additions.  Thus,  implementing  this  method  means  that  we  should  again  consume 
only  four  levels  per  level. 

4.3  Using  Bootstrapping 

Without  bootstrapping,  implementing  ten  rounds  requires  over  40  levels  in  the  modulus  chain,  which  means 
that  we  need  a  very  large  dimension  to  get  security.  We  could  hope  to  use  the  “bootstrapping  as  optimiza¬ 
tion”  technique  from  BGV  [5]  to  get  smaller  dimension,  and  hence  speed  up  the  computation.  As  it  turns 
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Test 

m 

4>{m) 

Ivls 

\Q\ 

seeurity 

params/key-gen 

Enerypt 

Deerypt 

memory 

no  bootstrap 

53261 

46080 

40 

886 

150-bit 

26.45  /  73.03 

245.1 

394.3 

3GB 

bootstrap 

28679 

23040 

23 

493 

123-bit 

148.2/37.2 

1049.9 

1630.5 

3.7GB 

Table  1:  Performence  results  of  homomorphie  AES.  Time  is  in  seeonds,  the  modulus  size  \  Q\  ineludes  extra 
primes  as  in  Seetion  3.1. 


out,  however,  the  reduetion  in  dimension  is  not  enough  to  eompensate  for  the  extra  time  spent  in  the  re- 
eryption  proeedure  itself,  so  this  does  not  lead  to  faster  proeess.  Bootstrapping  is  still  needed,  however,  in 
applieations  that  further  proeess  the  result  of  the  AES  eneryption.  Henee  in  our  implementation  we  also 
tested  ineorporating  reeryption  into  the  AES  eomputation. 

One  avenue  for  optimization  in  this  ease  is  to  reerypt  several  eiphertexts  together:  The  implementation 
of  reeryption  in  HElib  handles  “fully  paeked  eiphertexts”  whose  slots  eontain  elements  from  F2d  (for  some 
d  divisible  by  8),  but  our  AES  implementation  only  uses  F28  elements  (i.e.  bytes)  in  the  slots.  We  ean 
therefore  reerypt  several  eiphertexts  together,  paeking  d/S  bytes  in  eaeh  slot.  Sinee  in  this  setting  most  of 
the  AES  eomputation  time  is  spent  on  reeryption,  we  ean  proeess  d/8  eiphertexts  at  nearly  the  same  time 
as  we  do  a  single  eiphertext,  yielding  a  nearly  d/8  speedup  in  amortized  time.  In  our  experiments  we  used 
d  =  24,  so  this  yields  roughly  a  3  x  improvement. 

4.4  Performance  Details 

As  remarked  in  the  introduetion,  we  tested  our  implementations  on  a  two-year-old  Eenovo  X230  laptop 
with  Intel  Core  i5-3320M  running  at  2.6GHz,  on  an  Ubuntu  14.04  VM  with  4GB  of  RAM,  using  the  g-|— |- 
eompiler  version  4.9.2.  The  results  of  these  tests  are  summarized  in  Table  1. 

Non-bootstrapping  implementation.  Eor  the  non-bootstrapping  experiment  we  seleeted  parameters  large 
enough  to  eope  with  40  levels  of  eomputation.  Appendix  C  eontains  our  old  derivation  of  the  parameters  to 
use,  in  our  newer  implementation  we  used  instead  the  HElib  derivation  (that  takes  into  eonsideration  also  the 
hybrid  approaeh  from  Seetion  3.1),  and  is  deseribed  in  the  HElib  design  doeument  [18,  See  3.1.4].  A  rule- 
of-thumb  is  that  for  an  L-level  eomputation  we  need  the  dimension  to  be  roughly  1000  •  L.  Speeifieally  here 
we  worked  with  the  m-th  eyelotomie  for  m  =  53261,  whieh  yields  lattiees  of  dimension  4>{m)  =  46080. 
This  setting  has  1920  slots,  so  we  ean  fit  1920/16  =  120  AES  bloeks  in  eaeh  eiphertext. 

Eor  this  setting,  key-generation  took  about  1.5  minutes,  of  whieh  roughly  30  seeonds  were  spent  eomput- 
ing  key-independent  tables  and  about  one  minute  was  spent  generating  the  keys  and  key-switehing  matriees. 
The  input  to  the  aetual  eomputation  eonsisted  of  120  plaintext  bloeks  (in  eleartext),  and  the  eleven  AES 
round  keys  enerypted  in  eleven  paeked  eiphertext  using  our  homomorphie  eneryption  seheme.  Homomor¬ 
phie  AES -eneryption  operation  took  252  seeonds,  yielding  throughput  of  2  seeonds  per  bloek. 

Implementation  using  bootstrapping.  Sinee  bootstrapping  in  HElib  takes  about  12  levels,  we  ehose  our 
parameters  here  to  eope  with  more  than  20  levels  of  eomputation,  so  that  we  ean  eompute  at  least  two 
AES  rounds  per  reeryption.  Speeifieally  we  had  23  eomputation  levels  and  worked  with  m  =  28679  and 
(l>{m)  =  23040,  a  setting  that  yields  123-bit  seeurity  by  our  estimates  (see  Equation  (8)  in  Appendix  C). 
This  setting  features  960  slots  per  eiphertext,  eaeh  holding  an  element  of  F224,  whieh  is  enough  to  paek  60 
AES  bloeks. 


119 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


Key-generation  for  this  setting  took  about  four  minutes,  three  of  whieh  were  spent  eomputing  key- 
independent  tables,  and  under  one  minute  spent  on  generating  the  keys  and  key-switehing  matriees.  The 
input  to  the  aetual  eomputation  eonsisted  of  1 80  plaintext  bloeks  (in  eleartext),  and  the  same  1 1  paeked 
eipehrtext  enerypting  the  AES  round  keys.  During  the  eomputation  we  applied  the  AES  operation  to  three 
eiphertexts  in  parallel,  and  paeked  them  into  a  single  eipehrtext  before  eaeh  reeryption. 

The  AES-eneryption  operation  took  1050  seeonds,  of  whieh  823  seeonds  were  spent  during  two  reeryp¬ 
tion  operations,  and  the  other  227  seconds  were  spent  on  the  AES  computation  of  the  three  ciphertexts.  With 
180  blocks,  this  gives  throughput  of  5.8  seconds  per  block.  The  entire  computation  used  3.7GB  of  memory. 

Implementing  AES  decryption.  We  also  implemented  the  AES  decryption  operation,  basically  by  just 
reversing  all  the  operations  of  the  AES-eneryption  circuit.  The  operations  performed  in  both  cases  are  nearly 
identical  (except  a  few  multiply-by-constant  operations),  and  yet  in  our  tests  the  decryption  time  was  about 
60%  slower  than  encryption. 

Eor  the  non-bootstrapping  case,  one  reason  is  that  the  AES  encryption  operation  begins  with  inversion 
that  lowers  the  level  of  the  ciphertext,  whereas  decryption  begins  with  the  linear  operations  that  keep  the 
level  more  or  less  the  same.  As  a  result,  operations  on  decryption  are  performed  2-3  levels  higher  than  on 
encryption,  which  means  that  they  need  to  manipulate  more  primes  in  our  chain  of  moduli.  It  is  not  clear  to 
us  why  this  causes  such  a  large  slowdown,  we  speculate  that  some  of  it  is  the  result  of  memory  swapping  or 
some  other  low-level  effects. 

Eor  the  bootstrapping  case,  the  reason  for  the  large  slowdown  is  that  the  last  inversion  operation  on 
decryption  happens  quite  low  in  the  chain,  which  triggers  one  more  reeryption  operation,  three  on  decryption 
vs.  two  on  encryption.  (This  artifactc  can  probably  be  removed  by  special-casing  the  last  round,  but  we  did 
not  attempt  to  do  it.) 
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A  More  Details 

Eollowing  [24,  5,  15,  29]  we  utilize  rings  defined  by  cyclotomic  polynomials,  A  =  Z[X]/$m(-^)-  We  let 
Kq  denote  the  set  of  elements  of  this  ring  reduced  modulo  various  (possibly  composite)  moduli  q.  The  ring 
A  is  the  ring  of  integers  of  a  the  mth  cyclotomic  number  field  K. 

A.l  Plaintext  Slots 

In  our  scheme  plaintexts  will  be  elements  of  A2,  and  the  polynomial  $m(A)  factors  modulo  2  into  I  ir¬ 
reducible  factors,  4>m(2f)  =  Fi{X)  ■  F-2{X)  ■  ■  ■  Fi{X)  (mod  2),  all  of  degree  d  =  (l){m)/L  Just  as  in 
[5,  15,  29]  each  factor  corresponds  to  a  “plaintext  slot”.  That  is,  we  view  a  polynomial  a  G  A2  as  represent¬ 
ing  an  Avector  (a  mod  Fi)l^^. 

It  is  standard  fact  that  the  Galois  group  Qal  =  0al(Q(Cm)/Q)  consists  of  the  mappings  Kk  '■  a{X) 
a{x^)  mod  4>m(A)  for  all  k  co-prime  with  m,  and  that  it  is  isomorphic  to  (Z/mZ)*.  As  noted  in  [15],  for 
each  i,j  G  {1,  2, . . . ,  £}  there  is  an  element  Kk  G  Cal  which  sends  an  element  in  slot  i  to  an  element  in  slot 
j.  Namely,  if  6  =  Ki(a)  then  the  element  in  the  j’th  slot  of  b  is  the  same  as  that  in  the  f’th  slot  of  a.  In 
addition  Cal  contains  the  Erobenius  elements,  X  — >  X^\  which  also  act  as  Erobenius  on  the  individual 
slots  separately. 

Eor  the  purpose  of  implementing  AES  we  will  be  specifically  interesfed  in  arithmetic  in  F28  (represented 
as  F28  =  ¥2[X]/G{X)  with  G{X)  =  A®  -|-  X^  A®  -|-  A  -|-  1).  We  choose  the  parameters  so  that  d  is 
divisible  by  8,  so  F2d  includes  F2d  as  a  subfield.  This  lets  us  think  of  the  plaintext  space  as  containing 
A  vectors  over  F2«. 
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A.2  Canonical  Embedding  Norm 

Following  [24],  we  use  as  the  “size”  of  a  polynomial  a  G  A  the  loo  norm  of  its  eanonieal  embedding.  Reeall 
that  the  eanonieal  embedding  of  a  G  A  into  is  the  (/)(m)-veetor  of  eomplex  numbers  (T(a)  = 
where  C,m  is  a  eomplex  primitive  m-th  root  of  unity  and  the  indexes  i  range  over  all  of  j mLy .  We  eall 

the  norm  of  a  (a)  the  canonical  embedding  norm  of  a,  and  denote  it  by 

l|a||^"  =  lk(a)lloo. 

We  will  make  use  of  the  following  properties  of  ||  •  ||^": 

•  For  all  a,  6  G  A  we  have  ||a  •  <  Hall^"  •  ||&||S"- 

•  For  all  a  G  A  we  have  Hall^"  <  ||a||i. 

•  There  is  a  ring  eonstant  Cm  (depending  only  on  m)  sueh  that  ||a||oo  <  Cm  •  ||a||S"  for  all  a  G  A. 

The  ring  eonstant  Cm  is  defined  by  Cm  =  ||CRT“^||oo  where  CRT^  is  the  CRT  matrix  for  m,  i.e.  the 
Vandermonde  matrix  over  the  eomplex  primitive  m-th  roots  of  unity.  Asymptotieally  the  value  Cm  can  grow 
super-polynomially  with  m,  but  for  the  “small”  values  of  m  one  would  use  in  praetiee  values  of  Cm  ean  be 
evaluated  direetly.  See  [11]  for  a  diseussion  of  Cm- 


Canonical  Reduction.  When  working  with  elements  in  Ag  for  some  integer  modulus  q,  we  sometimes 
need  a  version  of  the  eanonieal  embedding  norm  that  plays  niee  with  reduetion  modulo  q.  Following  [15], 
we  define  fhe  canonical  embedding  norm  reduced  modulo  q  of  an  elemenf  a  G  A  as  fhe  smallesf  eanonieal 
embedding  norm  of  any  a'  whieh  is  eongruenf  fo  a  modulo  q.  We  denofe  if  as 

l^jcan  ||a'||^"  :  a  e  A,  a  =  a  (mod  q)  }. 

We  somefimes  also  denote  fhe  polynomial  where  fhe  minimum  is  obfained  by  [a]g^",  and  eall  if  fhe  canonical 
reduction  of  a  modulo  q.  Neifher  fhe  eanonieal  embedding  norm  nor  fhe  eanonieal  reduefion  is  used  in  fhe 
seheme  ifself,  if  is  only  in  fhe  analysis  of  if  fhaf  we  will  need  fhem.  We  nofe  fhaf  (frivially)  we  have 


A.3  Double  CRT  Representation 

As  noted  in  Seetion  2,  we  usually  represent  an  element  a  G  Ag  via  double-CRT  representation,  with  respeet 
to  both  the  polynomial  faetor  of  <Fm(X)  and  the  integer  faetors  of  q.  Speeitieally,  we  assume  that  Z/g^Z 
eontains  a  primitive  m-th  root  of  unity  (eall  it  Q,  so  <hm(A)  faetors  modulo  q  to  linear  terms  = 

~  (™od  q).  We  also  denote  g’s  prime  faetorization  by  g  =  ni=o^’*-  Then  a  polynomial 
a  G  Aq  is  represented  as  the  (t  -|-  1)  x  (f){m)  matrix  of  its  evaluation  at  the  roots  of  4>m(A)  modulo  pi  for 
i  =  0, . . . ,  f: 

dble-CRT*(a)  =  (a  (C^)  mod  p, )(,<.<, • 

The  double  CRT  representation  ean  be  eomputed  using  t-|- 1  invoeations  of  the  FFT  algorithm  modulo  the  pi, 
pieking  only  the  FFT  eoeffieients  whieh  eorrespond  to  elements  in  {IjIrnL)* .  To  invert  this  representation 
we  invoke  the  inverse  FFT  algorithm  f  -|-  1  times  on  a  veetor  of  length  m  eonsisting  of  the  thinned  out  values 
padded  with  zeros,  then  apply  the  Chinese  Remainder  Theorem,  and  then  reduee  modulo  4>m(A)  and  g. 
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Addition  and  multiplication  in  Ag  can  be  computed  as  component-wise  addition  and  multiplication  of 
the  entries  in  the  two  tables  (modulo  the  appropriate  primes  pi), 

dble-CRT*(a  +  b)=  dble-CRT*(a)  +  dble-CRT*(6) 
dble-CRT‘(a  •  h)  =  dble-CRT*(a)  •  dble-CRT*(6). 

Also,  for  an  element  of  the  Galois  group  Kk  G  Sal  (which  maps  a(X)  G  A  to  a{X^)  mod  ^jn{X)),  we  can 
evaluate  Kk{a)  on  the  double-CRT  representation  of  a  just  by  permuting  the  columns  in  the  matrix,  sending 
each  column  j  to  column  j  ■  k  mod  m. 

A.4  Sampling  From  Ag 

At  various  points  we  will  need  to  sample  from  Ag  with  different  distributions,  as  described  below.  We  denote 
choosing  the  element  a  G  A  according  to  distribution  Vby  a  ^  V.  The  distributions  below  are  described  as 
over  (/)(m) -vectors,  but  we  always  consider  them  as  distributions  over  the  ring  A,  by  identifying  a  polynomial 
a  G  A  with  its  coefficient  vector. 

The  uniform  distribution  Ug:  This  is  just  the  uniform  distribution  over  which  we  identify  with 

(Z  n  {—qj2,  Note  that  it  is  easy  to  sample  from  Ug  directly  in  double-CRT  representation. 

The  “discrete  Gaussian”  'DQg{a‘^):  Let  M{0,  denote  the  normal  (Gaussian)  distribution  on  real  numbers 
with  zero-mean  and  variance  cj^,  we  use  drawing  from  and  rounding  to  the  nearest  integer  as 

an  approximation  to  the  discrete  Gaussian  distribution.  Namely,  the  distribution  T>Gqt{<j‘^)  draws  a  real 
(/(-vector  according  to  AA(0,  rounds  it  to  the  nearest  integer  vector,  and  outputs  that  integer  vector 

reduced  modulo  q  (into  the  interval  {—q(2,  q/2]). 

Sampling  small  polynomials,  ZO{p)  and  HWT (h):  These  distributions  produce  vectors  in  {0, 

For  a  real  parameter  p  G  [0, 1],  ZO{p)  draws  each  entry  in  the  vector  from  {0,  ±1},  with  probability 
/j/2  for  each  of  —1  and  -|-1,  and  probability  of  being  zero  1  —  p. 

For  an  integer  parameter  h  <  the  distribution  HWT {h)  chooses  a  vector  uniformly  at  random 

from  {0,  subject  to  the  conditions  that  it  has  exactly  h  nonzero  entries. 

A.5  Canonical  embedding  norm  of  random  polynomials 

In  the  coming  sections  we  will  need  to  bound  the  canonical  embedding  norm  of  polynomials  that  are  pro¬ 
duced  by  the  distributions  above,  as  well  as  products  of  such  polynomials.  In  some  cases  it  is  possible  to 
analyze  the  norm  rigorously  using  Chernoff  and  Hoeffding  bounds,  but  to  set  the  parameters  of  our  scheme 
we  instead  use  a  heuristic  approach  that  yields  better  constants: 

Let  a  G  A  be  a  polynomial  that  was  chosen  by  one  of  the  distributions  above,  hence  all  the  (nonzero) 
coefficients  in  a  are  IID  (independently  identically  distributed).  For  a  complex  primitive  m-th  root  of  unity 
Qm,  the  evaluation  a{(m)  is  the  inner  product  between  the  coefficient  vector  of  a  and  the  fixed  vector  Zm  = 
(1,  Cm,  Cm,  •  •  •)’  which  has  Euclidean  norm  exactly  Hence  the  random  variable  a(Cm)  has  variance 

V  =  a‘^4>{m),  where  cj^  is  the  variance  of  each  coefficient  of  a.  Specifically,  when  a  ^  Ug  then  each 
coefficient  has  variance  q^  112,  so  we  get  variance  Vjj  =  g^(/>(m)/12.  When  a  <—  VQg{a‘^)  we  get  variance 
Vg  ~  and  when  a  <—  ZO{p)  we  get  variance  Vz  =  p4>{m).  When  choosing  a  HWT{h)  we 

get  a  variance  of  Vh  =  h  (but  not  (j){m),  since  a  has  only  h  nonzero  coefficients). 
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Moreover,  the  random  variable  a{Cm)  is  a  sum  of  many  IID  random  variables,  henee  by  the  law  of  large 
numbers  it  is  distributed  similarly  to  a  eomplex  Gaussian  random  variable  of  the  speeified  varianee.^  We 
therefore  use  6y/V  (i.e.  six  standard  deviations)  as  a  high-probability  bound  on  the  size  of  a{Crn)-  Sinee  the 
evaluation  of  a  at  all  the  roots  of  unity  obeys  the  same  bound,  we  use  six  standard  deviations  as  our  bound 
on  the  eanonieal  embedding  norm  of  a.  (We  ehose  six  standard  deviations  sinee  erfe(6)  2“^®,  whieh  is 
good  enough  for  us  even  when  using  the  union  bound  and  multiplying  it  by  (f){m)  2^®.) 

In  many  eases  we  need  to  bound  the  eanonieal  embedding  norm  of  a  produet  of  two  sueh  “random 
polynomials”.  In  this  ease  our  task  is  to  bound  the  magnitude  of  the  produet  of  two  random  variables,  both 
are  distributed  elose  to  Gaussians,  with  varianees  cr^,  respeetively.  For  this  ease  we  use  IQaafJb  as  our 
bound,  sinee  erfe(4)  2“^^,  so  the  probability  that  both  variables  exeeed  their  standard  deviation  by  more 
than  a  faetor  of  four  is  roughly  2“^^. 

B  The  Basic  Scheme 

We  now  define  our  leveled  HE  seheme  on  L  levels;  ineluding  the  Modulus-Switehing  and  Key-Switehing 
operations  and  the  proeedures  for  KeyGen,  Enc,  Dec,  and  for  Add,  Mult,  Scalar-Mult,  and  Automorphism. 

Recall  that  a  ciphertext  vector  c  in  the  cryptosystem  is  a  valid  encryption  of  a  G  A  with  respect  to 
secret  key  s  and  modulus  q  if  [[(c,  s)]q]2  =  a,  where  the  inner  product  is  over  A  =  the 

operation  [-jg  denotes  modular  reduction  in  coefficient  representation  into  the  interval  {—ql2,  +q/2],  and 
we  require  that  the  “noise”  [(c,  s)]g  is  sufficiently  small  (in  canonical  embedding  norm  reduced  mod  q).  In 
our  implementation  a  “normal”  ciphertext  is  a  2-vector  c  =  (co,ci),  and  a  “normal”  secret  key  is  of  the 
form  s  =  (1,-5),  hence  decryption  takes  the  form 

[cq  —  Cl  •  s]g  mod  2.  (2) 


B.l  Our  Moduli  Chain 

We  define  the  chain  of  moduli  for  our  depth-L  homomorphic  evaluation  by  choosing  L  “small  primes” 
Po ,  Pi ,  •  •  • )  Pl-  1  and  the  t ’th  modulus  in  our  chain  is  defined  as  qt  =  05=0  Pj  ■  ^"^he  sizes  will  be  determined 
later.)  The  primes  pj’s  are  chosen  so  that  for  all  i,  ’L/pi'L  contains  a  primitive  m-th  root  of  unity.  Hence  we 
can  use  our  double-CRT  representation  for  all  Ag^ . 

This  choice  of  moduli  makes  it  easy  to  get  a  level- (f  —  1)  representation  of  a  G  A  from  its  level-f  repre¬ 
sentation.  Specifically,  given  the  level-f  double-CRT  representation  dble-CRT*(a)  for  some  a  G  Ag^,  we  can 
simply  remove  from  the  matrix  the  row  corresponding  to  the  last  small  prime  pt,  thus  obtaining  a  level-(f— 1) 
representation  of  a  mod  qt-i  G  Agj_j .  Similarly  we  can  get  the  double-CRT  representation  for  lower  levels 
by  removing  more  rows.  By  a  slight  abuse  of  notation  we  write  dble-CRT*  (a)  =  dble-CRT*(a)  mod  qt' 
for  t'  <  t. 

Recall  that  encryption  produces  ciphertext  vectors  valid  with  respect  to  the  largest  modulus  qi-i  in  our 
chain,  and  we  obtain  ciphertext  vectors  valid  with  respect  to  smaller  moduli  whenever  we  apply  modulus¬ 
switching  to  decrease  the  noise  magnitude.  As  described  in  Section  3.3,  our  implementation  dynamically 
adjust  levels,  performing  modulus  switching  when  the  dynamically-computed  noise  estimate  becomes  too 
large.  Hence  each  ciphertext  in  our  scheme  is  tagged  with  both  its  level  t  (pinpointing  the  modulus  qt  relative 
to  which  this  ciphertext  is  valid),  and  an  estimate  u  on  the  noise  magnitude  in  this  ciphertext.  In  other  words, 

"'The  mean  of  a((rn)  is  zero,  since  the  coefficients  of  a  are  chosen  from  a  zero-mean  distribution. 
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a  ciphertext  is  a  triple  (c,  t,  v)  with  0<t<L  —  1,  ca  vector  over  and  v  a  real  number  which  is  used 
as  our  noise  estimate. 

B.2  Modulus  Switching 

The  operation  SwitchModulus(c)  takes  the  ciphertext  c  =  ((cq,  ci),  t,  v)  defined  modulo  qt  and  produces  a 
ciphertext  c'  =  ((cq,  f  —  1,  v')  defined  modulo  qt-i.  Such  fhaf  [cq  — s  •  ci]q^  =  [cq  — 5  •  (mod  2), 

and  ly'  is  smaller  fhan  v.  This  procedure  makes  use  of  fhe  funclion  Scale(a;,  g,  g')  fhaf  lakes  an  elemenl 
x  G  Ag  and  refurns  an  elemenl  y  G  Ag/  such  fhaf  in  coefficienl  represenlalion  if  holds  fhaf  y  =  x  (mod  2), 
and  y  is  fhe  closesl  elemenl  lo  {q' /q)  ■  x  lhal  satisfies  Ihis  mod-2  condition. 

To  mainlain  Ihe  noise  estimate,  Ihe  procedure  uses  Ihe  pre-sel  ring-conslanl  Cm  (cf.  Appendix  A.2)  and 
also  a  pre-sel  conslanl  Sscaie  which  is  meanl  lo  bound  Ihe  magnilude  of  Ihe  added  noise  term  from  Ihis 
operation.  Il  works  as  follows: 

SwitchModulus((co,  ci),  1,  v): 

1.  If  1  <  1  Ihen  aborl; 

2.  u'  ^  -17  +  Sscale; 

3.  If  ly'  >  qt-il2cm  then  aborl; 

4.  Ci  ^  Sca\e{ci,qt,qt-i)  for  1  =  0,1; 

5.  Oulpul  ((cq,  1  —  1,  z^'). 

The  conslanl  ilscaie  is  sel  as  Bscaie  =  ‘2'\/ /3  •  (8\//i  -|-  3),  where  h  is  Ihe  Hamming  weighl  of  Ihe 
secrel  key.  (In  our  implemenlalion  we  use  h  =  64,  so  we  gel  Hscaie  ~  77 y^^(r^.)  To  justify  Ihis  choice,  we 
apply  lo  Ihe  proof  of  Ihe  modulus  swilching  lemma  from  [15,  Lemma  13]  (in  Ihe  full  version),  relative  lo  Ihe 
canonical  embedding  norm.  In  lhal  proof  il  is  shown  lhal  when  Ihe  noise  magnilude  in  Ihe  inpul  cipherlexl 
c  =  (co,  Cl)  is  bounded  by  v,  Ihen  Ihe  noise  magnilude  in  Ihe  oulpul  vector  c'  =  (cq,  c'l)  is  bounded  by 
v'  =  2^  •  +  II  (s,  r)  ||§^",  provided  lhal  Ihe  Iasi  quantify  is  smaller  lhan  qt-i/2. 

Above  T  is  Ihe  “rounding  error”  vector,  namely  r  (To,ri)  =  (cq,c'^)  —  Heurislically 

assuming  lhal  r  behaves  as  if  ils  coefficienls  are  chosen  uniformly  in  [— 1,+1],  Ihe  evaluation  Ti{C)  al  an 
m-lh  roof  of  unify  Cm  is  dislribuled  close  to  a  Gaussian  complex  wilh  variance  (j){m)/3.  Also,  s  was  drawn 
from  7iWT{h)  so  s(Cm)  is  dislribuled  close  to  a  Gaussian  complex  wilh  variance  h.  Hence  we  expecl 
ri(C)s(C)  to  have  magnilude  al  mosl  16Y^^(m)737~/i  (recall  lhal  we  use  h  =  64).  We  can  similarly  bound 
'^o(Cm)  by  6-y/(/>(m)/3,  and  Iherefore  Ihe  evaluation  of  (s,  r)  al  Cm  is  bounded  in  magnilude  (whp)  by: 

lQ^J(j){m)/3  ■  h  +Q^Jc|){m)/3  =  2^J^{m)l3  ■  {sVh  +  3)  77y^(j){m)  =  Hscaie  (3) 

B.3  Key  Switching 

After  some  homomorphic  evaluation  operations  we  have  on  our  hands  nol  a  “normal”  cipherlexl  which  is 
valid  relative  to  “normal”  secrel  key,  bul  ralher  an  “extended  cipherlexl”  {{do,  di,  d2),  qt,  v)  which  is  valid 
wilh  respecl  to  an  “extended  secrel  key”  s'  =  (1,  — s,  —s').  Namely,  Ihis  cipherlexl  encrypls  Ihe  plainlexl 
a  G  A  via 

a  =  [[do  -  5  ■  di  -  s'  ■  1^2] gj 2 

and  Ihe  magnilude  of  Ihe  noise  [do-s-di  —  d2  -  s']  is  bounded  by  u.  In  our  implemenlalion,  Ihe  componenl 
6  is  always  Ihe  same  elemenl  s  G  A  lhal  was  drawn  from  TtWT {h)  during  key  generation,  bul  s'  can  vary 
depending  on  Ihe  operation.  (See  Ihe  description  of  multiplication  and  automorphisms  below.) 
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To  enable  that  translation,  we  use  some  “key  switehing  matriees”  that  are  ineluded  in  the  publie  key.  (In 
our  implementation  these  “matriees”  have  dimension  2x1,  i.e.,  the  eonsist  of  only  two  elements  from  A.) 
As  explained  in  Seetion  3.1,  we  save  on  spaee  and  time  by  artifieially  “boosting”  the  modulus  we  use  from 
qt  up  to  P  •  qt  for  some  “large”  modulus  P.  We  note  that  in  order  to  represent  elements  in  A pq^  using  our 
dble-CRT  representation  we  need  to  ehoose  P  so  that  Z/PZ  also  has  primitive  m-th  roots  of  unity.  (In  faet 
in  our  implementation  we  piek  P  to  be  a  prime.) 

The  key-switching  “matrix”.  Denote  by  Q  =  P  •  qL-2  the  largest  modulus  relative  to  whieh  we  need 
to  generate  key-switehing  matriees.  To  generate  the  key-switehing  matrix  from  s'  =  (1,  — s,  —s')  to  s  = 
(1,  — s)  (note  that  both  keys  share  the  same  element  s),  we  ehoose  two  element,  one  uniform  and  the  other 
from  our  “diserete  Gaussian”, 

as, s'  and  ^P^Q(cr^), 

where  the  varianee  ct  is  a  global  parameter  (that  we  later  set  as  ct  =  3.2).  The  “key  switehing  matrix”  then 
eonsists  of  the  single  eolumn  veetor 

^  s]  =  f  )  ,  Where  6,,,,  [s  •  a,,,,  +  2e,,,,  +  Ps']^.  (4) 

V  as,s'  J  ^ 

Note  that  W  above  is  defined  modulo  Q  =  Pq^_^,  but  we  need  to  use  it  relative  to  Qt  =  Pqt  for  whatever 
the  eurrent  level  t  is.  Henee  before  applying  the  key  switehing  proeedure  at  level  t,  we  reduee  W  modulo  Qt 
to  get  Wt  [hP]  Qt  ■  It  is  important  to  note  that  sinee  Qt  divides  Q  then  Wt  is  indeed  a  key-switehing  matrix. 
Namely  it  is  of  the  form  (6,  o)^  with  a  G  Uq^  and  b  =  [s  •  a  +  2eg^si  +  Ps'Jq^  (with  respeet  to  the  same 
element  eg^g/  G  A  from  above). 

The  SwitchKey  procedure.  Given  the  extended  eiphertext  c  =  ((do;  di,d2),t,  v)  and  the  key-switehing 
matrix  Wt  =  (6,  a)^,  the  proeedure  Switch Key^y^j(c)  proeeeds  as  follows:^ 

SwitchKey(fe  y  ((do,  di,  d2),  f,  v)\ 

1 .  Set  (  I  <—  (  ^  I  I  1  I  ;  II  The  aetual  key-switehing  operation 

V  Cl  y  LV  a  y  V  J\q, 

2.  c''  <—  Scale(c',  Qt,  qt)  for  f  =  0, 1;  //  Seale  the  veetor  baek  down  to  qt 

3.  u'  ^  p  +  Pks  ■  Qt/P  +  Pscaiei  H  The  eonstant  Pks  is  determined  below 

4.  Output  ((c'o',c'i'),f,i/'). 

To  argue  eorreetness,  observe  that  although  the  “aetual  key  switehing  operation”  from  above  looks 
superfieially  different  from  the  standard  key-switehing  operation  c'  <—  Vk  •  c,  it  is  merely  an  optimization 
that  takes  advantage  of  the  faet  that  both  veetors  s'  and  s  share  the  element  s.  Indeed,  we  have  the  equality 
over  Aq,: 

Cq  -  S  •  C'l  =  [(P  •  do)  +  d2  •  bg^gl  -  5  ■  ((P  •  dl)  +  d2  •  Ug^gl) 

=  P  •  (do  -  s  •  di  -  6'd2)  +  2  •  d2  •  e^y , 

^For  simplicity  we  describe  the  SwitchKey  procedure  as  if  it  always  switches  back  to  mod-qt,  but  in  reality  if  the  noise  estimate 
is  large  enough  then  it  can  switch  directly  to  qt-i  instead. 
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so  as  long  as  both  sides  are  smaller  than  Qt  we  have  the  same  equality  also  over  A  (without  the  moA-Qt 
reduetion),  whieh  means  that  we  get 

[cq  -  s  •  c'Jqj  =  [P  ■  {do  -  5  ■  di  -  5'd2)  +  2-d2-  =  [do  -  5  ■  di  -  5'd2]Qt  (mod  2). 

To  analyze  the  size  of  the  added  term  2d2eg^gi,  we  ean  assume  heuristieally  that  d2  behaves  like  a  uniform 
polynomial  drawn  from  Uq^,  henee  d2{Cm)  for  a  eomplex  root  of  unity  Qrn  is  distributed  elose  to  a  eomplex 
Gaussian  with  varianee  q|(/)(m)/12.  Similarly  eg^gi{Qrn)  is  distributed  elose  to  a  eomplex  Gaussian  with 
varianee  a‘^(j){m),  so  2d2{C)^{C)  can  be  modeled  as  a  produet  of  two  Gaussians,  and  we  expeet  that  with 
overwhelming  probability  it  remains  smaller  than  2  •  16  •  Y^qj(/>(m)/12  •  a‘^(p{m)  =  ^  •  aqt4>{m).  This 

yields  a  heuristie  bound  16/\/3  •  a4>{m)  ■  qt  =  i?Ks  •  Qt  on  the  eanonieal  embedding  norm  of  the  added 
noise  term,  and  if  the  total  noise  magnitude  does  not  exeeed  Qt(2cm  then  also  in  eoeffieient  representation 
everything  remains  below  Qt/2.  Thus  our  eonstant  i?Ks  is  set  as 

16o^m)  ^  <^(j(l){m)  =  Sks  (5) 

Finally,  dividing  by  P  (whieh  is  the  effeet  of  the  Scale  operation),  we  obtain  the  final  eiphertext  that  we 
require,  and  the  noise  magnitude  is  divided  by  P  (exeept  for  the  added  i?scaie  term). 

B.4  Key- Generation,  Encryption,  and  Decryption 

The  proeedures  below  depend  on  many  parameters,  h,  a,  m,  the  primes  pi  and  P,  ete.  These  parameters  will 
be  determined  later. 

KeyGen():  Given  the  parameters,  the  key  generation  proeedure  ehooses  a  low-weight  seeret  key  and  then 
generates  an  LWE  instanee  relative  to  that  seeret  key.  Namely,  we  ehoose 

5^HWT{h),  a^Uq^_^,  wA  e  ^  VQ q^_^{a‘^) 

Then  sets  the  seeret  key  as  s  and  the  publie  key  as  (a,  b)  where  b  =  [a  ■  s  +  2e]q^_^ . 

In  addition,  the  key  generation  proeedure  adds  to  the  publie  key  some  key-switehing  “matriees”,  as 
deseribed  in  Appendix  B.3.  Speeifieally  the  matrix  W[5^  s]  for  use  in  multiplieation,  and  some  matriees 
W[Ki{5)  s]  for  use  in  automorphisms,  for  Ki  G  ^al  whose  indexes  generates  (ineluding  in 

partieular  K2). 

Encpr(m):  To  enerypt  an  element  m  G  A2,  we  ehoose  one  “small  polynomial”  (with  0,  ±1  eoeffieients)  and 
two  Gaussian  polynomials  (with  varianee  cj^), 

V  ^  ZO{0.5)  and  eo,  ei  ^ 

Then  we  set  Co  =  b-v+2-eo+m,  ci  =  a-u+2-ei,  and  set  the  initial  eiphertext  as  c'  =  (cq,  ci,  A  — 1,  i?ciean)? 
where  i?ciean  is  a  parameter  that  we  determine  below. 

The  noise  magnitude  in  this  eiphertext  (Aciean)  is  a  little  larger  than  what  we  would  like,  so  before  we 
start  eomputing  on  it  we  do  one  modulus-switeh.  That  is,  the  eneryption  proeedure  sets  c  ^  SwitchModulus(c') 
and  outputs  c.  We  ean  deduee  a  value  for  i?ciean  as  follows: 

I  |C3n  ^  II  iif'sn 

|co-s-ci|^^  <  ||co-s-ci||^ 

=  ||((a  •  s  +  2  •  e)  •  u  +  2  •  Co  +  m  -  (a  •  u  +  2  •  Cl)  •  s||^" 

=  ||m  +  2  •  (e  •  u  +  Co  -  Cl  •  s)||“" 

<  ||m||r  +  2  •  (lie  •  ufr  +  lleollS"  +  Iki  ' 
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Using  our  complex  Gaussian  heuristic  from  Appendix  A.5,  we  can  bound  the  canonical  embedding  norm  of 
the  randomized  terms  above  by 


Also,  the  norm  of  the  input  message  m  is  clearly  bounded  by  hence  (when  we  substitute  our  param¬ 

eters  /i  =  64  and  a  =  3.2)  we  get  the  bound 

4>{m)  +  32a(f>{m) / \/2  +  12cj (j){m)  +  32a\/ h  ■  (f){m)  ra  74(/)(m)  +  858 4>{m)  =  i?ciean  (6) 

Our  goal  in  the  initial  modulus  switching  from  qi-i  to  qL-2  is  to  reduce  the  noise  from  its  initial  level  of 
^ciean  =  &{<j){m))  to  our  basc-line  bound  of  B  =  0(\/ which  is  determined  in  Equation  (12)  below. 

Decp{(c):  Decryption  of  a  ciphertext  (cq,  ci,  f,  u)  at  level  t  is  performed  by  setting  m'  <—  [cq  —  s  • 
then  converting  m'  to  coefficient  representation  and  outputting  m'  mod  2.  This  procedure  works  when 
Cm  -  V  <  qtl‘1,%0  this  procedure  only  applies  when  the  constant  Cm  for  the  field  A  is  known  and  relatively 
small  (which  as  we  menfioned  above  will  be  frue  for  all  pracfical  paramefers).  Also,  we  musf  pick  fhe 
smallesf  prime  qo  =  po  large  enough,  as  described  in  Appendix  C.2. 

B.5  Homomorphic  Operations 

Add(c,  c'):  Given  fwo  cipherfexfs  c  =  ((cq,  ci),  f,  z/)  and  c'  =  ((cq,  c'^),  f',  i^'),  representing  messages 
m,  m'  G  A2,  fhis  algorifhm  forms  a  cipherfexf  Ca  =  ((ao)  ai)Ua,  which  encrypfs  fhe  message  rria  = 
m  +  m'. 

If  fhe  fwo  cipherfexfs  do  nof  belong  fo  fhe  same  level  fhen  we  reduce  fhe  larger  one  modulo  fhe  smaller 
of  fhe  fwo  moduli,  fhus  bringing  fhem  fo  fhe  same  level.  (This  simple  modular  reducfion  works  as  long  as 
fhe  noise  magnifude  is  smaller  fhan  fhe  smaller  of  fhe  fwo  moduli,  if  Ibis  condifion  does  nof  hold  fhen  we 
need  fo  do  modulus  swifching  rafher  fhan  simple  modular  reducfion.)  Once  fhe  fwo  cipherfexfs  are  af  fhe 
same  level  (call  if  t"),  we  jusf  add  fhe  fwo  cipherfexf  vectors  and  fwo  noise  esfimafes  fo  gef 

Ca  =  (([co  +  Co]q^„,  [Ci  +  C%^„)  ,  t” ,  U  +  u')  . 

Mult(c,  c'):  Given  fwo  cipherfexfs  representing  messages  m,  m'  G  A2,  fhis  algorifhm  forms  a  cipherfexf 
encrypfs  fhe  message  m  •  m'. 

We  begin  by  ensuring  fhaf  fhe  noise  magnifude  in  bofh  cipherfexfs  is  smaller  fhan  fhe  pre-sef  consfanf 
B  (which  is  our  base-line  bound  and  is  defermined  inEquafion  (12)  below),  performing  modulus-swifching 
as  needed  fo  ensure  fhis  condifion.  Then  we  bring  bofh  cipherfexfs  fo  fhe  same  level  by  reducing  modulo 
fhe  smaller  of  fhe  fwo  moduli  (if  needed).  Once  bofh  cipherfexfs  have  small  noise  magnifude  and  fhe  same 
level  we  form  fhe  exfended  cipherfexf  (essentially  performing  fhe  fensor  producf  of  fhe  fwo)  and  apply 
key-swifching  fo  gef  back  a  normal  cipherfexf.  A  pseudo-code  description  of  fhis  procedure  is  given  below. 

Mult(c,  c'): 

1.  While  i/(c)  >  B  do  c  ^  SwitchModulus(c);  //  z/(c)  is  fhe  noise  esfimafe  in  c 

2.  While  i/(c')  >  B  do  c'  ^  SwitchModulus(c');  //  z/(c')  is  fhe  noise  estimate  in  c' 

3.  Bring  c,  c'  to  fhe  same  level  t  by  reducing  modulo  fhe  smaller  of  fhe  fwo  moduli 
Denote  after  modular  reducfion  c  =  ((cq,  ci),  f,  u)  and  c'  =  ((cq,  Ci),t,  v') 
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4.  Set  {do,  di,d2)  ^  (cq  •  Cq  ,  ci  •  Cq  +  cq  •  4  ,  -  ci  •  c'J; 

Denote  c"  =  ((do,  di,d2),t,  v  • 

5.  Output  SwitchKeyp^/[^2^g](c")  // Convert  to  “normal”  eiphertext 

We  stress  that  t/ie  only  place  where  we  foree  modulus  switehing  is  before  the  multiplieation  operation. 
In  all  other  operations  we  allow  the  noise  to  grow,  and  it  will  be  redueed  baek  the  first  time  it  is  input  to  a 
multiplieation  operation.  We  also  note  that  we  may  need  to  apply  modulus  switehing  more  than  onee  before 
the  noise  is  small  enough. 

Scalar-Mult(c,  a)\  Given  a  eiphertext  c  =  (cq,  ci,t,  u)  representing  the  message  m,  and  an  element  a  G  A2 
(represented  as  a  polynomial  modulo  2  with  eoeffieients  in  {—1,  0, 1}),  this  algorithm  forms  a  eiphertext 
Cm  =  (uo, «!, fm,  I'm)  whieh  enerypts  the  message  rrim  =  a  •  m.  This  proeedure  is  needed  in  our  imple¬ 
mentation  of  homomorphie  AES,  and  is  of  more  general  interest  in  general  eomputation  over  finite  fields. 

The  algorithm  makes  use  of  a  proeedure  Random ize(a)  whieh  takes  a  and  replaees  eaeh  non-zero  eo¬ 
effieients  with  a  eoeffieients  ehosen  at  random  from  {—1,1}.  To  multiply  by  a,  we  set  (3  <—  Random ize(a) 
and  then  just  multiply  both  cq  and  ci  by  /3.  Using  the  same  argument  as  we  used  in  Appendix  A.5  for  the 
distribution  HWT{h),  here  too  we  ean  bound  the  norm  of  (3  by  ||/3||^"  <  GyA/Vt^a)  where  Wt(a)  is  the 
number  of  nonzero  eoeffieients  of  a.  Henee  we  multiply  the  noise  estimate  by  6-y/Wt(a),  and  output  the 
resulting  eiphertext  Cm  =  (cq  ■  (3,  ci  ■  j3,  t,  v  ■  6y^Wt(a)). 

Automorphism(c,  k):  In  the  main  body  we  explained  how  permutations  on  the  plaintext  slots  ean  be  real¬ 
ized  via  using  elements  k  G  ^al;  we  also  require  the  applieation  of  sueh  automorphism  to  implement  the 
Frobenius  maps  in  our  AES  implementation. 

For  eaeh  k  that  we  want  to  use,  we  need  to  inelude  in  the  publie  key  the  “matrix”  W[k{5)  5].  Then, 

given  a  eiphertext  c  =  (cq,  ci,t,  u)  representing  the  message  m,  the  funetion  Automorphism(c,  k)  produees 
a  eiphertext  c'  =  (cq,  c},  f,  v')  whieh  represents  the  message  K(m).  We  first  set  an  “extended  eiphertext”  by 
setting 

do  =  k(co))  di  0,  and  d2  <—  k{ci) 

and  then  apply  key  switehing  to  the  extended  eiphertext  {{do,  di,  ^2),  A  i')  using  the  “matrix”  1U[k(s)  ^  5]. 

C  Security  Analysis  and  Parameter  Settings 

Below  we  derive  the  eonerete  parameters  for  use  in  our  early  implementation.  This  part  of  the  report  is 
outdated,  we  left  it  here  for  historieal  purpose. 

We  begin  in  Appendix  C.l  by  deriving  a  lower-bound  on  the  dimension  N  of  the  EWE  problem  under¬ 
lying  our  key-switehing  matriees,  as  a  funetion  of  the  modulus  and  the  noise  varianee.  (This  will  serve  as 
a  lower-bound  on  ^(m)  for  our  ehoiee  of  the  ring  polynomial  ^rn{X)-)  Then  in  Appendix  C.2  we  derive 
a  lower  bound  on  the  size  of  the  largest  modulus  Q  in  our  implementation,  in  terms  of  the  noise  varianee 
and  the  dimension  N.  Then  in  Appendix  C.3  we  ehoose  a  value  for  the  noise  varianee  (as  small  as  possible 
subjeet  to  some  nominal  seeurity  eoneerns),  solve  the  somewhat  eireular  eonstraints  on  N  and  Q,  and  set  all 
the  other  parameters. 

C.l  Lower-Bounding  the  Dimension 

Below  we  apply  to  the  EWE-seeurity  analysis  of  Eindner  and  Peikert  [22],  together  with  a  few  (arguably 
justifiable)  assumptions,  to  analyze  the  dimension  needed  for  different  seeurity  levels.  The  analysis  below 
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assumes  that  we  are  given  the  modulus  Q  and  noise  variance  for  the  LWE  problem  (i.e.,  the  noise  is 
chosen  from  a  discrete  Gaussian  distribution  modulo  Q  with  variance  in  each  coordinate).  The  goal  is  to 
derive  a  lower-bound  on  the  dimension  N  required  to  get  any  given  security  level.  The  first  assumption  that 
we  make,  of  course,  is  that  the  Lindner-Peikert  analysis  —  which  was  done  in  the  context  of  standard  LWE 
—  applies  also  for  our  ring-LWE  case.  We  also  make  the  following  extra  assumptions: 

•  We  assume  that  (once  a  is  not  too  tiny),  the  security  depends  on  the  ratio  Qja  and  not  on  Q  and  a 
separately.  Nearly  all  the  attacks  and  hardness  results  in  the  literature  support  this  assumption,  with 
the  exception  of  the  Arora-Ge  attack  [2]  (that  works  whenever  a  is  very  small,  regardless  of  Q). 

•  The  analysis  in  [22]  devised  an  experimental  formula  for  the  time  that  it  takes  to  get  a  particular  quality 
of  reduced  basis  (i.e.,  the  parameter  (5  of  Gama  and  Nguyen  [12]),  then  provided  another  formula  for 
the  advantage  that  the  attack  can  derive  from  a  reduced  basis  at  a  given  quality,  and  finally  used  a 
computer  program  to  solve  these  formulas  for  some  given  values  of  N  and  b.  This  provides  some 
time/advantage  tradeoff,  since  obtaining  a  smaller  value  of  (5  (i.e.,  higher-quality  basis)  takes  longer 
time  and  provides  better  advantage  for  the  attacker. 

Eor  our  purposes  we  made  the  assumption  that  the  best  runtime/advantage  ratio  is  achieved  in  the 
high-advantage  regime.  Namely  we  should  spend  basically  all  the  attack  running  time  doing  lattice 
reduction,  in  order  to  get  a  good  enough  basis  that  will  break  security  with  advantage  (say)  1/2.  This 
assumption  is  consistent  with  the  results  that  are  reported  in  [22]. 

•  Einally,  we  assume  that  to  get  advantage  of  close  to  1/2  for  an  LWE  instance  with  modulus  Q  and 
noise  cr,  we  need  to  be  able  to  reduce  the  basis  well  enough  until  the  shortest  vector  is  of  size  roughly 
Qja.  Again,  this  is  consistent  with  the  results  that  are  reported  in  [22]. 

Given  these  assumptions  and  the  formulas  from  [22],  we  can  now  solve  the  dimension/security  tradeoff 
analytically.  Because  of  the  first  assumption  we  might  as  well  simplify  the  equations  and  derive  our  lower 
bound  on  N  for  the  case  <7  =  1,  where  the  ratio  Q/cj  is  equal  to  Q.  (In  reality  we  will  use  a  ^  4  and 
increase  the  modulus  by  the  same  2  bits). 

Eollowing  Gama-Nguyen  [12],  recall  that  a  reduced  basis  B  =  (61I62I  •  •  •  \bm)  for  a  dimension-M, 
determinant-D  lattice  (with  ||6i||  <  II62II  <  •  •  •  ||ftM||)>  has  quality  parameter  (5  if  the  shortest  vector  in  that 
basis  has  norm  ||6i||  =  5^  ■  .  In  other  words,  the  quality  of  B  is  defined  as  (5  =  \\bi\\^^^ . 

The  fime  (in  seconds)  that  it  takes  to  compute  a  reduced  basis  of  quality  5  for  a  random  LWE  instance  was 
estimated  in  [22]  to  be  at  least 

log(time)  >  1.8/log((5)  —  110.  (7) 

Eor  a  random  Q-ary  lattice  of  rank  N,  the  determinant  is  exactly  whp,  and  therefore  a  quality-(5  basis  has 
1 1 61 II  =  5^  ■  .  By  our  second  assumption,  we  should  reduce  the  basis  enough  so  that  ||6i  ||  =  Q.  so  we 

need  Q  =  5^  ■  .  The  LWE  attacker  gets  to  choose  the  dimension  M,  and  the  best  choice  for  this  attack 

is  obtained  when  the  right-hand-side  of  the  last  equality  is  minimized,  namely  for  M  =  N  log  Q /  log  5. 
This  yields  the  condition 

log Q  =  \og{5^Q^/^)  =  M log <5  +  {N/M)  log Q  =  2^N\ogQ\og5, 

which  we  can  solve  for  N  to  get  N  =  log  <5/4  log  5.  Einally,  we  can  use  Equation  (7)  to  express  log  (5  as  a 
function  of  log(time),  thus  getting  N  =  log  Q  •  (log(time)  -|-  110) /7.2.  Recalling  that  in  our  case  we  used 


131 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


O'  =  1  (so  Q/c  =  Q)>  we  get  our  lower-bound  on  N  in  terms  of  Q/a.  Namely,  to  ensure  a  time/advantage 
ratio  of  at  least  2^,  we  need  to  set  the  rank  N  to  be  at  least 

^  ^  log{Q/a){k  +  110) 

7  >2 

For  example,  the  above  formula  says  that  to  get  80-bit  seeurity  level  we  need  to  set  N  >  log{Q/a)  •  26.4, 
for  100-bit  seeurity  level  we  need  N  >  log{Q/a)  •  29.1,  and  for  128-bit  seeurity  level  we  need  N  > 
log{Q/a)  •  33.1.  We  eomment  that  these  values  are  indeed  eonsistent  with  the  values  reported  in  [22]. 

C.l.l  LWE  with  Sparse  Key 

The  analysis  above  applies  to  “generie”  LWE  instanee,  but  in  our  ease  we  use  very  sparse  seeret  keys  (with 
only  /i  =  64  nonzero  eoeffieients,  all  ehosen  as  ±1).  This  brings  up  the  question  of  whether  one  ean  get 
better  attaeks  against  LWE  instanees  with  a  very  sparse  seeret  (mueh  smaller  than  even  the  noise).  We 
note  that  Goldwasser  et  al.  proved  in  [17]  that  LWE  with  low-entropy  seeret  is  as  hard  as  standard  LWE 
with  weaker  parameters  (for  large  enough  moduli).  Although  the  speeifie  parameters  from  that  proof  do  not 
apply  to  our  ehoiee  of  parameter,  it  does  indieate  that  weak-seeret  LWE  is  not  “fundamentally  weaker”  than 
standard  LWE.  In  terms  of  attaeks,  the  only  attaek  that  we  eould  find  that  takes  advantage  of  this  sparse  key 
is  by  applying  the  reduetion  teehnique  of  Applebaum  et  al.  [1]  to  switeh  the  key  with  part  of  the  error  veetor, 
thus  getting  a  smaller  LWE  error. 

In  a  sparse-seeret  LWE  we  are  given  a  random  A^-by-M  matrix  A  (modulo  Q),  and  also  an  M-veetor 
y  =  [sA  +  ejg.  Here  the  A"-veetor  s  is  our  very  sparse  seeret,  and  e  is  the  error  M-veetor  (whieh  is  also 
short,  but  not  sparse  and  not  as  short  as  s). 

Below  let  Ai  denotes  the  first  N  eolumns  of  A,  A2  the  next  N  eolumns,  then  A3,  A4,  ete.  Similarly 
ei,  62, . . .  are  the  eorresponding  parts  of  the  error  veetor  and  yi,  y2,  •  •  •  the  eorresponding  parts  of  y.  As¬ 
suming  that  Al  is  invertible  (whieh  happens  with  high  probability),  we  ean  transform  this  into  an  LWE 
instanee  with  respeet  to  seeret  ei,  as  follows: 

We  have  yi  =  sAi  -|-  ei,  or  alternatively  A]”^yi  =  s  -|-  A]"^ei.  Also,  for  i  >  1  we  have  y^  =  sAj  -|-  e*, 
whieh  together  with  the  above  gives  AjA]”^yi  —  ji  =  AjA]”^ei  —  e*.  Henee  if  we  denote 

Bi  for  f  >  1  Bi  AiAl“^, 

1  {i0f  1 

and  similarly  zi  =  A]”  yi,  and  for  i  >  1  Zj  =  AjA]"  yi, 

and  then  set  B  . . .)  and  z  (zi|z2|z3|  . . .),  and  also  f  =  (s|e2|e3|  . . .)  then  we  get  the 

LWE  instanee 

z  =  e\B  +  f 

with  seeret  e^.  The  thing  that  makes  this  LWE  instanee  potentially  easier  than  the  original  one  is  that  the 
first  part  of  the  error  veetor  /  is  our  sparse/small  veetor  s,  so  the  transformed  instanee  has  smaller  error  than 
the  original  (which  means  that  it  is  easier  to  solve). 

Trying  to  quantify  the  effect  of  this  attack,  we  note  that  the  optimal  M  value  in  the  attack  from  Ap¬ 
pendix  C.l  above  is  obtained  at  M  =  2N,  which  means  that  the  new  error  vector  is  f  =  (s|e2),  which  has 
Euclidean  norm  smaller  than  e  =  (ei|e2)  by  roughly  a  factor  of  (assuming  that  ||s||  <C  ||ei||  ss  ||e2||). 
Maybe  some  further  improvement  can  be  obtained  by  using  a  smaller  value  for  M,  where  the  shorter  error 
may  outweigh  the  “non  optimal”  value  of  M.  However,  we  do  not  expect  to  get  major  improvement  this 
way,  so  it  seems  that  the  very  sparse  secret  should  only  add  maybe  one  bit  to  the  modulus/noise  ratio. 
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C.2  The  Modulus  Size 


In  this  section  we  assume  that  we  are  given  the  parameter  N  =  (j){m)  (for  our  polynomial  ring  modulo 
<hm(^))-  We  also  assume  that  we  are  given  the  noise  variance  ci^,  the  number  of  levels  in  the  modulus 
chain  L,  an  additional  “slackness  parameter”  ^  (whose  purpose  is  explained  below),  and  the  number  of 
nonzero  coefficients  in  the  secret  key  h.  Our  goal  is  to  devise  a  lower  bound  on  the  size  of  the  largest 
modulus  Q  used  in  the  public  key,  so  as  to  maintain  the  functionality  of  the  scheme. 


Controlling  the  Noise.  Driving  the  analysis  in  this  section  is  a  bound  on  the  noise  magnitude  right  after 
modulus  switching,  which  we  denote  below  by  B.  We  set  our  parameters  so  that  starting  from  ciphertexts 
with  noise  magnitude  B,  we  can  perform  one  level  of  fan-in-two  multiplications,  then  one  level  of  fan-in-^ 
additions,  followed  by  key  switching  and  modulus  switching  again,  and  get  the  noise  magnitude  back  to  the 
same  B. 


•  Recall  that  in  the  “reduced  canonical  embedding  norm”,  the  noise  magnitude  is  at  most  multiplied 
by  modular  multiplication  and  added  by  modular  addition,  hence  after  the  multiplication  and  addition 
levels  the  noise  magnitude  grows  from  B  to  as  much  as  ^B^. 


•  As  we’ve  seen  in  Appendix  B. 3,  performing  key  switching  scales  up  the  noise  magnitude  by  a  factor  of 
P  and  adds  another  noise  term  of  magnitude  upto  i?Ks  •  Qt  (before  doing  modulus  switching  to  scale  it 
back  down).  Hence  starting  from  noise  magnitude  ^B^,  the  noise  grows  to  magnitude  P^B^  +  Hks  •  qt 
(relative  to  the  modulus  Pqt)- 

Below  we  assume  that  after  key-switching  we  do  modulus  switching  directly  to  a  smaller  modulus. 


•  After  key-switching  we  can  switch  to  the  next  modulus  qt-i  to  decrease  the  noise  back  to  our  bound  B. 
Following  the  analysis  from  Appendix  B.2,  switching  moduli  from  Qt  to  qt-i  decreases  the  noise 
magnitude  by  a  factor  of  qt-ijQt  =  ^/{P  ■  Pt),  and  then  add  a  noise  term  of  magnitude  Hscaie- 

Starting  from  noise  magnitude  P^B^  +  Hks  •  qt  before  modulus  switching,  the  noise  magnitude  after 
modulus  switching  is  therefore  bounded  whp  by 


P  •  +  Hks 

P  -Pt 


qt  ^  .  BKs-qt-i  p 

- H  i>scale  — - 1 - ^ - H  .Dscale 


Pt 


P 


Using  the  analysis  above,  our  goal  next  is  to  set  the  parameters  B,  P  and  the  pt’s  (as  functions  of  N,  a,  L,  g 
and  h)  so  that  in  every  level  t  we  get  — h  +  Pscaie  <  B.  Namely  we  need  to  satisfy  at  every 

level  t  the  quadratic  inequality  (in  B) 

-P'  -  P  +  f  +  Pscale)  <  0  .  (9) 

Pt  V.  p  ^  J 

denote  this  by  Rt-i 


Observe  that  (assuming  that  all  the  primes  pt  are  roughly  the  same  size),  it  suffices  to  satisfy  this  inequality 
for  the  largest  modulus  t  =  L  — 2,  since  Rt-i  increases  with  larger  f’s.  Noting  that  Rl-  3  >  Pscaie.  we  want 
to  get  this  term  to  be  as  close  to  Pscaie  as  possible,  which  we  can  do  by  setting  P  large  enough.  Specifically, 
fo  make  if  as  close  as  Rl-3  =  (1  +  2“”)Pscaie  it  is  sufficienl  fo  sef 


P 


on  PKsgL-3 
Pscaie 


^^9aNqL-3 

77Vn 


2^-\l-3  ■  (tVn, 


(10) 
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Below  we  set  (say)  n  =  8,  whieh  makes  it  elose  enough  to  use  just  Rl-3  ~  ^scaie  for  tho  derivation  below. 

Clearly  to  satisfy  Inequality  (9)  we  must  have  a  positive  diseriminant,  whieh  means  1  — >  0, 
or  Pl-2  >  Using  the  value  Rl-z  ~  f?scaie>  this  translates  into  setting 

Pi  RS  P2  •  •  •  ~  PL-2  ~  4^  •  ^scale  ~  308^\/]V  (1 1) 

Finally,  with  the  diseriminant  positive  and  all  the  pj’s  roughly  the  same  size  we  ean  satisfy  Inequality  (9)  by 
setting 

B  ^  ^ ^  2Bscaie  ~  154v/]V.  (12) 

2C/PL-2  2^ 

The  Smallest  Modulus.  After  evaluating  our  L-level  eireuit,  we  arrive  at  the  last  modulus  qo  =  po  with 
noise  bounded  by  To  be  able  to  deerypt,  we  need  this  noise  to  be  smaller  than  qQj'lcm,  where  Cm  is 
the  ring  eonstant  for  our  polynomial  ring  modulo  <hm(2f).  For  our  setting,  that  eonstant  is  always  below  40, 
so  a  suffieient  eondition  for  being  able  to  deerypt  is  to  set 

qo=Po  -  ~  220-9eiV  (13) 


The  Encryption  Modulus.  Recall  that  freshly  encrypted  ciphertext  have  noise  Bdean  (as  defined  in  Equa¬ 
tion  (6)),  which  is  larger  than  our  baseline  bound  B  from  above.  To  reduce  the  noise  magnitude  after  the  first 
modulus  switching  down  to  i?,  we  therefore  set  the  ratio  pi_i  =  so  that  Bdean/Pi-i  Tf^scaie  < 

B.  This  means  that  we  set 


PL-l 


Bc\ear\  74N  +  858^/N  a— 

- - - - VA  +  11 

B  —  f^scale  77\/N 


(14) 


The  Largest  Modulus.  Having  set  all  the  parameters,  we  are  now  ready  to  calculate  the  resulting  bound 
on  the  largest  modulus,  namely  Ql-2  =  <?L-2  •  P-  Using  Equations  (11),  and  (13),  we  get 

=  po-Ylpi  ^  ■  {SOS^VnY  =  220-9 . 308*  •  (15) 

i=l 

Now  using  Equation  (10)  we  have 

P  ^  2^qL-3a\fN  225-9 . 308^-0  •  ^^“2  .  ]s[P-z)/2+i  . 

^  2  ■  308^  •  ^^■2^jyL/2 

and  finally 

Ql-2  =  P  ■  qL-2  ~  (2  •  308^  •  ■  (220-9 . 308*^-2  •  •  N^P) 

^  ^  .  216.5L+5.4  .  ^2L-3  . 


C.3  Putting  It  Together 

We  now  have  in  Equation  (8)  a  lower  bound  on  N  in  terms  of  Q,  cr  and  the  security  level  k,  and  in  Equa¬ 
tion  (16)  a  lower  bound  on  Q  with  respect  to  N,  a  and  several  other  parameters.  We  note  that  a  is  a  free 
parameter,  since  it  drops  out  when  substituting  Equation  (16)  in  Equation  (8).  In  our  implementation  we 
used  cj  =  3.2,  which  is  the  smallest  value  consistent  with  the  analysis  in  [25]. 
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For  the  other  parameters,  we  set  ^  =  8  (to  get  a  small  “wiggle  room”  without  inereasing  the  parameters 
mueh),  and  set  the  number  of  nonzero  coefficients  in  the  secret  key  at  /i  =  64  (which  is  already  included  in 
the  formulas  from  above,  and  should  easily  defeat  exhaustive-search/birthday  type  of  attacks).  Substituting 
these  values  into  the  equations  above  we  get 


po  ^  223-9 AT,  Pi  ^  2^^-^Vn  for  f  =  1, . . . ,  L  -  2 
p^^n.3L-5N^/2^  and  Ql-2  ^ 

Substituting  the  last  value  of  Ql-2  into  Equation  (8)  yields 

(L(logiV  +  23)-8.5)(fc  +  110) 

7.2  ^  ’ 

Targeting  k  =  80-bits  of  security  and  solving  for  several  different  depth  parameters  L,  we  get  the  results  in 
the  table  below,  which  also  lists  approximate  sizes  for  the  primes  pi  and  P. 


L 

N 

log2(Po) 

log2(Pi) 

log2(FL-l) 

log2(-P) 

10 

9326 

37.1 

17.9 

7.5 

177.3 

20 

19434 

38.1 

18.4 

8.1 

368.8 

30 

29749 

38.7 

18.7 

8.4 

564.2 

40 

40199 

39.2 

18.9 

8.6 

762.2 

50 

50748 

39.5 

19.1 

8.7 

962.1 

60 

61376 

39.8 

19.2 

8.9 

1163.5 

70 

72071 

40.0 

19.3 

9.0 

1366.1 

80 

82823 

40.2 

19.4 

9.1 

1569.8 

90 

93623 

40.4 

19.5 

9.2 

1774.5 

Choosing  Concrete  Values.  Having  obtained  lower-bounds  on  =  (j){m)  and  other  parameters,  we  now 
need  to  fix  precise  cyclofomic  fields  Q(Cm)  to  supporf  fhe  algebraic  operafions  we  need.  We  have  fwo 
sifuafions  we  will  be  inferesfed  in  for  our  experimenfs.  The  firsl  corresponds  fo  performing  arifhmelic  on 
bytes  in  F28  (i.e.  n  =  8),  whereas  fhe  latter  corresponds  fo  arithmetic  on  bits  in  F2  (i.e.  n  =  1).  We  therefore 
need  to  find  an  odd  value  of  m,  wifh  (j){m)  ^  N  and  m  dividing  2*^  —  1,  where  we  require  fhaf  d  is  divisible 
by  n.  Values  of  m  wifh  a  small  number  of  prime  faclors  are  preferred  as  fhey  give  rise  fo  smaller  values  of 
Cm-  We  also  look  for  paramefers  which  maximize  fhe  number  of  slofs  i  we  can  deal  wifh  in  one  go,  and 
values  for  which  is  close  fo  fhe  approximafe  value  for  N  esfimafed  above.  When  n  =  1  we  always 
selecf  a  sef  of  parameters  for  which  fhe  I  value  is  af  teas!  as  large  as  fhaf  obfained  when  n  =  8. 


L 

m 

n  = 

N  =  4>{m) 

00 

CK 

m 

n  = 
N  =  (j){m) 

1 

{d,i) 

CK 

10 

11441 

10752 

(48,224) 

3.60 

11023 

10800 

(45,240) 

5.13 

20 

34323 

21504 

(48,448) 

6.93 

34323 

21504 

(48,448) 

6.93 

30 

31609 

31104 

(72,432) 

5.15 

'il'ill 

32316 

(57,568) 

1.27 

40 

54485 

40960 

(64,640) 

12.40 

42799 

42336 

(21,2016) 

5.95 

50 

59527 

51840 

(72,720) 

21.12 

54161 

52800 

(60,880) 

4.59 

60 

68561 

62208 

(72,864) 

36.34 

85865 

63360 

(60,1056) 

12.61 

70 

82603 

75264 

(56,1344) 

36.48 

82603 

75264 

(56,1344) 

36.48 

80 

92837 

84672 

(56,1512) 

38.52 

101437 

85672 

(42,2016) 

19.13 

90 

124645 

98304 

(48,2048) 

21.07 

95281 

94500 

(45,2100) 

6.22 
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D  Scale(c,  qt-i)  in  dble-CRT  Representation 

Let  Qi  —  n;=  -qPj,  where  the  pj’s  are  primes  that  split  eompletely  in  our  eyelotomie  field  A.  We  are  given 
a  c  G  represented  via  douhle-CRT  -  that  is,  it  is  represented  as  a  “matrix”  of  its  evaluations  at  the 
primitive  m-th  roots  of  unity  modulo  the  primes  po, ...  ,pt.  We  want  to  modulus  switeh  to  qt-i  -  i.e.,  seale 
down  hy  a  faetor  of  pt.  Let’s  reeall  what  this  means:  we  want  to  output  c'  G  A,  represented  via  douhle-CRT 
format  (as  its  matrix  of  evaluations  modulo  the  primes  po, . . .  ,Pt-i),  sueh  that 

1.  c'  =  c  mod  2. 

2.  c'  is  very  elose  (in  terms  of  its  eoeffieient  veetor)  to  c/pt. 

In  the  main  body  we  explained  how  this  eould  he  performed  in  dble-CRT  representation.  This  made  explieit 
use  of  the  faet  that  the  two  eiphertexts  need  to  he  equivalent  modulo  two.  If  we  wished  to  replaee  two  with 
a  general  prime  p,  then  things  are  a  bit  more  eomplieated.  For  eompleteness,  although  it  is  not  required  in 
our  seheme,  we  present  a  methodology  helow.  In  this  ease,  the  eonditions  on  are  as  follows: 

1.  =  c  ■  Pt  mod  p. 

2.  is  very  elose  to  c. 

3.  is  divisible  by  pt. 

As  before,  we  set  c'  <—  c'^  jpt.  (Note  that  for  p  =  2,  we  trivially  have  c-pt  =  c  mod  p,  sinee  pt  will  be  odd.) 

This  eauses  some  eomplieations,  beeause  we  set  <—  c  +  (5,  where  S  =  —c  mod  pt  (as  before)  but  now 
5  =  {pt  —  1)  ■  c  mod  p.  To  eompute  sueh  a  6,  we  need  to  know  c  mod  p.  Unfortunately,  we  don’t  have 
c  mod  p.  One  not-very-satisfying  way  of  dealing  with  this  problem  is  the  following.  Set  c  <—  [pt]p-c  mod  qt. 
Now,  if  c  enerypted  m,  then  c  enerypts  [pt]p  ■  m,  and  c’s  noise  is  \pt]p  <  p/2  times  as  large.  It  is  obviously 
easy  to  eompute  c’s  double-CRT  format  from  c’s.  Now,  we  set  so  that  the  following  is  true: 

1.  c^^  =  c  mod  p. 

2.  is  very  elose  to  c. 

3.  c^  is  divisible  by  pt. 

This  is  easy  to  do.  The  algorithm  to  output  c^  in  double-CRT  format  is  as  follows: 

1.  Set  c  to  be  the  eoeffieient  representation  of  c  mod  pt.  (Computing  this  requires  a  single  “small  FFT” 
modulo  the  prime  pt.) 

2.  Set  5  to  be  the  polynomial  with  eoeffieients  in  {—pt  ■  p/2,pt  •  p/2]  sueh  that  5  =  0  modp  and 
6  =  —c  mod  Pt. 

3.  Set  c^  =  c  -|-  5,  and  output  c^’s  double-CRT  representation. 

(a)  We  already  have  c’s  double-CRT  representation. 

(b)  Computing  5’s  double-CRT  representation  requires  t  “small  FFTs”  modulo  the  pj  ’s. 
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E  Other  Optimizations 


Some  other  optimizations  that  we  eneountered  during  our  implementation  work  are  diseussed  next.  Not  all 
of  these  optimizations  are  useful  for  our  eurrent  implementation,  but  they  may  be  useful  in  other  eontexts. 

Three-way  Multiplications.  Sometime  we  need  to  multiply  several  ciphertexts  together,  and  if  their  num¬ 
ber  is  not  a  power  of  two  then  we  do  not  have  a  complete  binary  tree  of  multiplications,  which  means  that  at 
some  point  in  the  process  we  will  have  three  ciphertexts  that  we  need  to  multiply  together. 

The  standard  way  of  implementing  this  3-way  multiplication  is  via  two  2-argument  multiplications,  e.g., 
X  ■  {y  ■  z).  But  it  turns  out  that  here  it  is  better  to  use  “raw  multiplication”  to  multiply  these  three  ciphertexts 
(as  done  in  [7]),  thus  getting  an  “extended”  ciphertext  with  four  elements,  then  apply  key-switching  (and 
later  modulus  switching)  to  this  ciphertext.  This  takes  only  six  ring-multiplication  operations  (as  opposed 
to  eight  according  to  the  standard  approach),  three  modulus  switching  (as  opposed  to  four),  and  only  one 
key  switching  (applied  to  this  4-element  ciphertext)  rather  than  two  (which  are  applied  to  3-element  ex¬ 
tended  ciphertexts).  All  in  all,  this  three-way  multiplication  takes  roughly  1.5  times  a  standard  two-element 
multiplication. 

We  stress  that  this  technique  is  not  useful  for  larger  products,  since  for  more  than  three  multiplicands 
the  noise  begins  to  grow  too  large.  But  with  only  three  multiplicands  we  get  noise  of  roughly  after  the 
multiplication,  which  can  be  reduced  to  noise  zz  B  hy  dropping  two  levels,  and  this  is  also  what  we  get  by 
using  two  standard  two-element  multiplications. 

Commuting  Automorphisms  and  Multiplications.  Recalling  that  the  automorphisms  X  com¬ 

mute  with  the  arithmetic  operations,  we  note  that  some  ordering  of  these  operations  can  sometimes  be 
better  than  others.  For  example,  it  may  be  better  perform  the  multiplication-by-constant  before  the  auto¬ 
morphism  operation  whenever  possible.  The  reason  is  that  if  we  perform  the  multiply-by-constant  after  the 
key-switching  that  follows  the  automorphism,  then  added  noise  term  due  to  that  key-switching  is  multiplied 
by  the  same  constant,  thereby  making  the  noise  slightly  larger.  We  note  that  to  move  the  multiplication-by- 
constant  before  the  automorphism,  we  need  to  multiply  by  a  different  constant. 

Switching  to  higher-level  moduli.  We  note  that  it  may  be  better  to  perform  automorphisms  at  a  higher 
level,  in  order  to  make  the  added  noise  term  due  to  key-switching  small  with  respect  to  the  modulus.  On 
the  other  hand  operations  at  high  levels  are  more  expensive  than  the  same  operations  at  a  lower  level.  A 
good  rule  of  thumb  is  to  perform  the  automorphism  operations  one  level  above  the  lowest  one.  Namely, 
if  the  naive  strategy  that  never  switches  to  higher-level  moduli  would  perform  some  Frobenius  operation 
at  level  qi,  then  we  perform  the  key-switching  following  this  Frobenius  operation  at  level  Qi+i,  and  then 
switch  back  to  level  qj+i  (rather  then  using  Qi  and  Qi). 

Commuting  Addition  and  Modulus-switching.  When  we  need  to  add  many  terms  that  were  obtained 
from  earlier  operations  (and  their  subsequent  key-switching),  it  may  be  better  to  first  add  all  of  these  terms 
relative  to  the  large  modulus  Qi  before  switching  the  sum  down  to  the  smaller  qi  (as  opposed  to  switching 
all  the  terms  individually  to  qi  and  then  adding). 

Reducing  the  number  of  key-switching  matrices.  When  using  many  different  automorphisms  Ki  :  X 
A®  we  need  to  keep  many  different  key-switching  matrices  in  the  public  key,  one  for  every  value  of  i  that 
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we  use.  We  ean  reduees  this  memory  requirement,  at  the  expense  of  taking  longer  to  perform  the  automor¬ 
phisms.  We  use  the  faet  that  the  Galois  group  Qs\  that  eontains  all  the  maps  m  (whieh  is  isomorphie  to 
{WjIrnL)*)  is  generated  by  a  relatively  small  number  of  generators.  (Speeifieally,  for  our  ehoiee  of  parame¬ 
ters  the  group  (Z/mZ)*  has  two  or  three  generators.)  It  is  therefore  enough  to  store  in  the  publie  key  only 
the  key-switehing  matriees  eorresponding  to  Kg^’s  for  these  generators  gj  of  the  group  Qal.  Then  in  order 
to  apply  a  map  k*  we  express  it  as  a  produet  of  the  generators  and  apply  these  generators  to  get  the  effeet  of 
Ki-  (For  example,  if  i  =  Pi  •  52  then  we  need  to  apply  Kg^  twiee  followed  by  a  single  applieation  of  Kg^.) 
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Better  Bootstrapping  in  Fully  Homomorphic  Encryption 


Craig  Gentry^,  Shai  Halevi^,  and  Nigel  P.  Smart^ 

^  IBM  T.J.  Watson  Research  Center 
^  Dept.  Computer  Science,  University  of  Bristol 


Abstract.  Gentry’s  bootstrapping  technique  is  currently  the  only  known  method 
of  obtaining  a  “pure”  fully  homomorphic  encryption  (FHE)  schemes,  and  it  may 
offers  performance  advantages  even  in  cases  that  do  not  require  pure  FHE  (e.g., 
when  using  the  noise-control  technique  of  Brakerski-Gentry-Vaikuntanathan). 
The  main  bottleneck  in  bootstrapping  is  the  need  to  evaluate  homomorphically 
the  reduction  of  one  integer  modulo  another.  This  is  typically  done  by  emulating  a 
binary  modular  reduction  circuit,  using  bit  operations  on  binary  representation  of 
integers.  We  present  a  simpler  approach  that  bypasses  the  homomorphic  modular- 
reduction  bottleneck  to  some  extent,  by  working  with  a  modulus  very  close  to  a 
power  of  two.  Our  method  is  easier  to  describe  and  implement  than  the  generic 
binary  circuit  approach,  and  we  expect  it  to  be  faster  in  practice  (although  we  did 
not  implement  it  yet).  In  some  cases  it  also  allows  us  to  store  the  encryption  of 
the  secret  key  as  a  single  ciphertext,  thus  reducing  the  size  of  the  public  key. 

We  also  show  how  to  combine  our  new  method  with  the  SIMD  homomorphic 
computation  techniques  of  Smart- Vercauteren  and  Gentry-Halevi-Smart,  to  get  a 
bootstrapping  method  that  works  in  time  quasi-linear  in  the  security  parameter. 
This  last  part  requires  extending  the  techniques  from  prior  work  to  handle  arith¬ 
metic  not  only  over  fields,  but  also  over  some  rings.  (Specifically,  our  method  uses 
arithmetic  modulo  a  power  of  two,  rather  than  over  characteristic-two  fields.) 
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1  Introduction 


Fully  Homomorphic  Encryption  (FHE)  [12, 7]  is  a  powerful  technique  to  enable 
a  party  to  compute  an  arbitrary  function  on  a  set  of  encrypted  inputs;  and  hence 
obtain  the  encryption  of  the  function’s  output.  Starting  from  Gentry’s  break¬ 
through  result  [6,7],  all  known  EHE  schemes  are  constructed  from  Somewhat 
Homomorphic  Encryption  (SWHE)  schemes,  that  can  only  evaluate  functions 
of  bounded  complexity.  The  ciphertexts  in  these  SWHE  schemes  include  some 
“noise”  to  ensure  security,  and  this  noise  grows  when  applying  homomorphic 
operations  until  it  becomes  so  large  that  it  overwhelms  the  decryption  algorithm 
and  causes  decryption  errors.  To  overcome  the  growth  of  noise.  Gentry  used  a 
bootstrapping  transformation,  where  the  decryption  procedure  is  run  homomor- 
phically  on  a  given  ciphertext,  using  an  encryption  of  the  secret  key  that  can  be 
found  in  the  public  key,^  resulting  in  a  new  ciphertext  that  encrypts  the  same 
message  but  has  potentially  smaller  noise. 

Over  the  last  two  years  there  has  been  a  considerable  amount  of  work  on  de¬ 
veloping  new  constructions  and  optimizations  [5, 13, 9, 3, 14, 2,  8, 1,11],  but  all 
of  these  constructions  still  have  noise  that  keeps  growing  and  must  be  reduced 
before  it  overwhelms  the  decryption  procedure.  The  techniques  of  Brakerski  et 
al.  [1]  yield  SWHE  schemes  where  the  noise  grows  slower,  only  linearly  with 
the  depth  of  the  circuit  being  evaluated,  but  for  any  fixed  public  key  one  can  still 
only  evaluate  circuits  of  fixed  depfh.  The  only  known  way  fo  gel  “pure”  FHE 
lhal  can  evaluate  arbilrary  functions  wifh  a  fixed  public  key  is  by  using  bool- 
sfrapping.  Also,  boolslrapping  can  be  used  in  conjunction  wifh  fhe  techniques 
from  [1]  lo  gel  belter  paramelers  (and  hence  fasler  homomorphic  evaluation),  as 
described  in  [1, 11]. 

In  nearly  all  SWHE  schemes  in  fhe  lileralure  lhal  supporl  boolslrapping, 
decryption  is  computed  by  evaluating  some  cipherlexl-dependenl  linear  opera¬ 
tion  on  Ihe  secrel  key,  Ihen  reducing  Ihe  resull  modulo  a  public  odd  modulus  q 
into  Ihe  range  {—q/2,q/2],  and  Ihen  laking  Ihe  leasl  significanl  bil  of  Ihe  re¬ 
sull.  Namely,  denoting  reduction  modulo  q  by  [-Jg,  we  decrypl  a  cipherlexl  c 
by  computing  a  =  [[Tc(s)]g]2  where  Lc  is  a  linear  function  and  s  is  Ihe  se¬ 
crel  key.  Given  an  encryption  of  Ihe  secrel  key  s,  computing  an  encryption  of 
Lc{s)  is  slraighlforward,  and  Ihe  bulk  of  Ihe  work  in  homomorphic  decryption 
is  devoted  to  reducing  Ihe  resull  modulo  q.  This  is  usually  done  by  computing 
encryptions  of  Ihe  bils  in  Ihe  binary  represenlalion  of  Lc{s)  and  Ihen  emulating 
Ihe  binary  circuil  lhal  reduces  modulo  q. 

The  slarling  poinl  of  Ihis  work  is  Ihe  observation  lhal  when  q  is  very  close  to 
a  power  of  Iwo,  Ihe  decryption  formula  lakes  a  particularly  simple  form.  Specifi- 

^  This  transformation  relies  on  the  underlying  SWHE  being  circularly  secure. 
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cally,  we  can  compute  the  linear  function  Lc{s)  modulo  a  power  of  two,  and  then 
XOR  the  top  and  bottom  bits  of  the  result.  We  then  explain  how  to  implement 
this  simple  decryption  formula  homomorphically,  and  also  how  the  techniques 
of  Gentry  et  al.  from  [11]  can  be  used  to  compute  this  homomorphic  decryption 
with  only  polylogarithmic  overhead. 

We  note  that  applying  the  techniques  from  [1 1]  to  bootstrapping  is  not  quite 
straightforward,  because  the  input  and  output  are  not  presented  in  the  correct 
form  for  these  techniques.  (This  holds  both  for  the  standard  approach  of  emu¬ 
lating  binary  mod-q  circuit  and  for  our  new  approach.)  Also,  for  our  case  we 
need  to  extend  the  results  from  [1 1]  slightly,  since  we  are  computing  a  function 
over  a  ring  (modulo  a  power  of  two)  and  not  over  a  field. 

We  point  out  that  in  all  work  prior  to  [11],  bootstrapping  required  adding  to 
the  public  key  many  ciphertexts,  encrypting  the  individual  bits  (or  coefficients) 
of  the  secret  key.  This  resulted  in  very  large  public  keys,  of  size  at  least  • 
polylog(A)  (where  A  is  the  security  parameter).  Using  the  techniques  from  [14, 

1, 1 1],  it  is  possible  to  encrypt  the  secret  key  in  a  “packed”  form,  hence  reducing 
the  number  of  ciphertexts  to  0(log  A)  (so  we  can  get  public  keys  of  size  quasi- 
linear  in  A).  Using  our  technique  from  this  work,  it  is  even  possible  to  store  an 
encryption  of  the  secret  key  as  a  single  ciphertext,  as  described  in  Section  4.  We 
next  outline  our  main  bootstrapping  technique  in  a  few  more  details. 

Our  method  applies  mainly  to  “leveled”  schemes  that  use  the  noise  con¬ 
trol  mechanism  of  Brakerski-Gentry-Vaikuntanathan  [1]."^  Below  and  through¬ 
out  this  paper  we  concentrate  on  the  BGV  ring-LWE-based  scheme,  since  it 
offers  the  most  efficient  homomorphic  operations  and  the  most  room  for  opti¬ 
mizations.^  The  scheme  is  defined  over  a  ring  i?  =  Z[X]/F(X)  for  a  monic,  ir¬ 
reducible  polynomial  F{X)  (over  fhe  infegers  Z).  For  an  arbifrary  infeger  modu- 

def 

lus  n  (nof  necessarily  prime)  we  denote  fhe  ring  =  R/nR  =  (Z/nZ)[A]/F(Af). 
The  scheme  is  paramefrized  by  fhe  number  of  levels  fhaf  if  can  handle,  which 
we  denofe  by  L,  and  by  a  sef  of  decreasing  odd  moduli  qo  ^  qi  ^  ^  qi, 

one  for  each  level. 

The  plainfexf  space  is  given  by  fhe  ring  i?2,  while  fhe  cipherfexf  space 
for  fhe  i’lh  level  consisfs  of  vectors  in  Secref  keys  are  polynomials 

s  G  i?  wifh  “small”  coefficienls,  and  we  view  s  as  fhe  second  elemenf  of 
fhe  2- vector  s  =  (l,s).  A  level-f  cipherfexf  c  =  (co,ci)  encrypfs  a  plain¬ 
fexf  polynomial  m  G  i?2  wifh  respecf  to  s  =  (l,s)  if  we  have  fhe  equably 

*  Our  method  can  be  used  also  with  other  schemes,  as  long  as  the  scheme  allows  us  to  choose  a 
modulus  very  close  to  a  power  of  two.  For  example  they  can  be  used  with  the  schemes  from 
[3,2]. 

^  Our  description  of  the  BGV  cryptosystem  below  assumes  modulo-2  plaintext  arithmetic,  gen¬ 
eralizing  to  modulo-p  arithmetic  for  other  primes  p  >  2  is  straightforward. 
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over  R,  [(c,  s)]q.  =  [cq  +  s  •  ci]q^  =  m  (mod  2),  and  moreover  the  polyno¬ 
mial  [co  -|-  s  •  Cl]q^  is  “small”,  i.e.  all  its  eoeffieients  are  eonsiderably  smaller 
than  qi.  Roughly,  that  polynomial  is  eonsidered  the  “noise”  in  the  eiphertext, 
and  its  eoeffieients  grow  as  homomorphie  operations  are  performed.^  The  erux 
of  the  noise-eontrol  teehnique  from  [1]  is  that  a  level-f  eiphertext  ean  be  pub- 
liely  eonverted  into  a  level- (i  -|-  1)  eiphertext  (with  respeet  to  the  same  seeret 
key),  and  that  this  transformation  reduees  the  noise  in  the  eiphertext  roughly  by 
a  faetor  of  qj+i/gj. 

Seeret  keys  too  are  assoeiated  with  levels,  and  the  publie  key  ineludes  some 
additional  information  that  (roughly  speaking)  makes  it  possible  to  eonvert  a 
eiphertext  with  respeet  to  level-i  key  Sj  into  a  eiphertext  with  respeet  to  level- 
(f  -|-  1)  key  Sj+i-  In  what  follows  we  will  only  be  interested  in  the  seeret  keys 
at  level  L  and  level  zero;  whieh  we  will  denote  by  s  and  s  respeetively  to  ease 
notation. 

For  bootstrapping,  we  have  as  input  a  level-L  eiphertext  (i.e.  a  veetor  c  G 
R/qlR  modulo  the  smallest  modulus  qi).  This  means  that  the  noise-eontrol 
teehnique  ean  no  longer  be  applied  to  reduee  the  noise,  henee  (essentially)  no 
more  homomorphie  operations  ean  be  performed  on  this  eiphertext.  To  enable 
further  eomputation,  we  must  therefore  “reerypt”  the  eiphertext  c,  to  obtain  a 
new  eiphertext  that  enerypts  the  same  element  of  R  with  respeet  to  some  lower 
level  i  <  L. 

Our  first  observation  is  that  the  deeryption  at  level  L  ean  be  made  more 
effieient  when  q^  is  elose  to  a  power  of  two,  speeifieally  qi  =  2"^  +  1  for  an 
integer  r,  and  moreover  the  eoeffieients  of  Z  =  (c,s)  mod  F{X)  are  mueh 
smaller  than  qj^  in  magnitude.  In  partieular  if  z  is  one  of  the  eoeffieients  of  the 
polynomial  Z  then  [[2:]q^]2  can  be  eomputed  as  z{r)  ©  z{0),  where  z{i)  is  the 
i’th  bit  of  z. 

To  evaluate  the  deeryption  formula  homomorphieally,  we  temporarily  ex¬ 
tend  the  plaintext  spaee  to  polynomials  modulo  2''+^  (rather  than  modulo  2). 
The  level-L  seeret  key  is  s  =  (l,s),  where  all  the  eoeffieients  of  5  are  small 
(in  the  interval  (—2’',  +2^)).  We  ean  therefore  eonsider  s  as  a  plaintext  polyno¬ 
mial  in  R/2'^~^^R,  enerypt  it  inside  a  level-0  eiphertext,  and  keep  that  eiphertext 
in  the  publie  key.  Thus,  given  the  level-L  eiphertext  c,  we  ean  evaluate  the  in¬ 
ner  produet  [(c,  s)  mod  L’(X)]  homomorphieally,  obtaining  a  level-0  eiphertext 
that  enerypts  the  polynomial  Z. 

For  simplieity,  assume  for  now  that  what  we  get  is  an  eneryption  of  all  the 
eoeffieients  of  Z  separately.  Given  an  eneryption  of  a  eoeffieient  z  of  Z  (whieh 
is  an  element  in  Z/2'’'''^Z)  we  show  in  Seetion  3.1  how  to  extraet  (eneryptions 
of)  the  zero’th  and  r’th  bit  using  a  data-oblivious  algorithm.  Henee  we  ean  fi- 

®  We  ignore  here  the  encryption  procedure,  since  it  does  not  play  any  role  in  the  current  work. 
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nally  recover  a  new  ciphertext,  encrypting  the  same  binary  polynomial  at  a  lower 
level  i  <  L. 

To  achieve  efficient  bootstrapping,  we  exploit  the  ability  to  perform  opera¬ 
tions  on  elements  modulo  in  a  SIMD  fashion  (Single  Instruction  Multiple 
Data);  much  like  in  prior  work  [14, 1, 11].  Some  care  must  be  taken  when  ap¬ 
plying  these  techniques  in  our  case,  since  the  inputs  and  outputs  of  the  boot¬ 
strapping  procedure  are  not  in  the  correct  format:  Specifically,  these  techniques 
require  that  inputs  and  outputs  be  represented  using  polynomial  Chinese  Re¬ 
mainders  (CRT  representation),  whereas  decryption  (and  therefore  recryption) 
inherently  deals  with  polynomials  in  coefficient  representation.  We  therefore 
must  use  explicit  conversion  to  CRT  representation,  and  ensure  that  these  con¬ 
versions  are  efficient  enough.  See  details  in  Section  4. 

Also,  the  techniques  from  prior  work  must  be  extended  somewhat  to  be 
usable  in  our  case:  Prior  work  demonstrated  that  SIMD  operations  can  be  per¬ 
formed  homomorphically  when  the  underlying  arithmetic  is  over  a  field,  but  in 
our  case  we  have  operations  over  the  ring  Z/2^+^Z,  which  is  not  a  field.  The 
algebra  needed  to  extend  the  SIMD  techniques  to  this  case  is  essentially  an  ap¬ 
plication  of  the  theory  of  local  fields  [4].  We  prove  many  of  the  basic  results 
that  we  need  in  the  full  version  [10],  and  refer  the  reader  to  [4]  for  a  general 
introduction  and  more  details. 

Notations.  Throughout  the  paper  we  denote  by  [z]  q  the  reduction  of  z  mod  q  into 
the  interval  (— |,  |].  We  also  denote  the  f’th  bit  in  the  binary  representation  of 
the  integer  z  by  z{i).  Similarly,  when  a  is  an  integer  polynomial  of  degree  d  with 
coefficients  (ao,ai,  •  •  •  ,ad),  we  denote  by  a{i)  the  0-1  degree-d  polynomial 
whose  coefficients  are  all  the  i’th  bits  (ao(i),  ai(i), . . . ,  ad{i))-  If  c,  s  are  two 
same-dimension  vectors,  then  (c,  s)  denotes  their  inner  product. 

Organization.  We  begin  by  presenting  the  simplified  decryption  formula  in 
Section  2  and  explain  how  to  evaluate  it  homomorphically  in  Section  3.  Then  in 
Section  4  we  recall  some  algebra  and  explain  how  to  use  techniques  similar  to 
[1 1]  to  run  bootstrapping  in  time  quasi-linear  in  the  security  parameter.  Some  of 
the  proofs  are  omitted  here,  these  are  found  in  the  full  version  of  this  work  [10]. 

2  A  simpler  decryption  formula 

When  the  small  modulus  ql  has  a  special  form  - i.e.  when  it  equals  u-2'^  +  vior 
some  integer  r  and  for  some  small  positive  odd  integers  u,v  -  then  the  mod-q^ 
decryption  formula  can  be  made  to  have  a  particularly  simple  form.  Below  we 
focus  on  the  case  of  qi  =  2'^  +  1,  which  suffices  for  our  purposes. 

So,  assume  that  qi  =  2''  +  1  for  some  integer  r  and  that  we  decrypt  by 
setting  a  <—  [[(c,  s)  mod  F{X)]q^]2.  Consider  now  the  coefficients  of  the  in- 
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teger  polynomial  Z  =  (c,  s)  mod  F{X),  without  the  reduction  mod  qi.  Since 
s  has  small  coefficients  (and  we  assume  that  reduction  mod-F(X)  does  not  in¬ 
crease  the  coefficients  by  much)  then  all  the  coefficients  of  Z  are  much  smaller 
than  q\.  Consider  one  of  these  integer  coefficients,  denoted  by  z,  so  we  know 
that  \z\  'C  ~  2^^.  We  consider  the  binary  representation  of  z  as  a  2r-bit 
integer,  and  assume  for  now  that  z  >  0  and  also  [z\q^  >  0.  We  claim  that  in  this 
case,  the  bit  [[-z]q^]2  can  be  computed  simply  as  the  sum  of  the  lowest  bit  and 
the  r’thbit  of  z,  i.e.,  [[^]qi]2  =  z{r)®z{Q).  (Recall  that  z{i)  is  the  i’thbit  of  z.) 

Lemma  1.  Let  q  =  2'^  +  ^for  a  positive  integer  r,  and  let  z  be  a  non-negative 
2 

integer  smaller  than  ^ —  q,  such  that  [z]q  is  also  non-negative,  [z]q  G  [0,  |]. 
Then  [[z]q]2  =  z{r)  ©  2:(0). 

Proof.  Let  zq  =  [z\q  G  and  consider  the  sequence  of  integers  Zi  =  ZQ-\-iq 

for  i  =  0, 1, 2,  •  •  • .  Since  we  assume  that  2:  >  0  then  2;  can  be  found  in  this 

2 

sequence,  say  the  /c’th  element  z  =  Zk  =  zq  -\-  kq.  Also  since  z  <  ^ —  q 
then  k  =  [z/q\  <  |  —  1.  The  bit  that  we  want  to  compute  is  [[z\q]2  =  -Zo(O)- 
We  claim  that  2:0(0)  =  2;fc(0)  +  Zk{r)  (mod  2).  This  is  because  Zk  =  2:0  + 
kq  =  2:0  +  k{2^  +  1)  =  (2:0  +  A:)  +  k2^,  which  in  particular  means  that 
Zk{0)  =  2o(0)  +  A:(0)  (mod  2).  But  since  0  <  20  <  9/2  and  0  <  k  <  q/2  —  1 
then  0  <  2o  +  A:  <  9  —  1  =  2^,  so  there  is  no  carry  bit  from  the  addition  zq  k 
to  the  r’th  bit  position.  It  follows  that  the  r’th  bit  of  Zk  is  equal  to  the  O’th  bit 
of  A:  (i.e.,  =  A(0)),  and  therefore  2fc(0)  =  2o(0)  +  A:(0)  =  zo{0) Zk{r) 

(mod  2),  which  implies  that  20 (0)  =  2^(0)  +  Zk{r)  (mod  2),  as  needed.  □ 

We  note  that  the  proof  can  easily  be  extended  for  the  case  q  =  u2^  +  v,  if 
the  bound  on  2  is  strengthened  by  a  factor  of  n.  To  remove  the  assumption  that 
both  2  and  [z]q  are  non-negative,  we  use  the  following  easy  corollary: 

Corollary  1.  Let  r  >  3  and  q  =  2’’  +  1  and  let  2  be  an  integer  with  absolute 
2 

value  smaller  than  T —  ^uch  that  [z\q  G  (— |,  |)-  Then  [[2)^)2  =  z{r)  © 

z{r  —  1)  ©  2(0). 

Proof  Denoting  2'  =  2+(q^-l)/4  =  2+(q+l)(q-l)/4  =  {z+T^)+q-T^, 
we  have  2'  =  2  +  (mod  q)  (since  2^  =  2’'“^  is  an  integer).  Moreover 
since  [z]q  G  (-|,  |]  then  [z]q  +  G  [0,  q/2],  hence  [z'\q  =  [z]q  +  ^  (over 
the  integers),  and  as  2^  is  an  even  integer  then  [z\q  =  [z']q  (mod  2),  or  in  other 
words  [[z]f\2  =  [['Z%]2-  Since  2  >  — ^  and  2  is  an  integer  then  2  >  — and 
therefore  2'  =  2  4  0.  Thus  z'  satisfies  all  the  conditions  set  in  Lemma  1, 

so  applying  that  lemma  we  have  [[2]  5)2  =  [[2:%]2  =  z'{r)  ©  2'(0). 
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We  next  observe  that  z'  =  z  +  {q  +  l)(g  —  l)/4  =  z  +  (2^  +  2)2^“^  = 
2:  +  Sinee  2r  —  2  >  r,  this  means  that  the  bits  0  through  r  in  the 

binary  representation  of  z'  are  determined  by  2;  +  2^~^  alone,  so  we  have: 

z'{i) 
z'{r  —  1) 

z'{r) 

Putting  it  all  together,  we  get  \\z\^2  =  [['2‘’]g]2  =  z' {r)  ©  ^'(0)  =  z{r)  © 
z{r  —  1)  ©  2(0).  □ 

Using  Corollary  1  we  can  get  our  simplified  decryption  formula.  First,  we 
set  our  parameters  such  that  =  2^  +  1  and  all  the  coefficients  of  the  integer 

<7^ 

polynomial  Z  =  (c,  s)  mod  F{X)  are  smaller  than  ^  —  1  in  absolute  value, 
and  moreover  they  are  all  less  than  away  from  a  multiple  of  Given  a 
two-element  ciphertext  c  =  (co,ci)  G  ((Z/g2,Z)[X]/F(X))^,  then  compute 
Z  <—  (c,  s)  mod  F{X)  over  the  integers  (without  reduction  mod  qi),  and  fi¬ 
nally  recover  fhe  plainfexf  as  Z{r)  +  Z{r  —  1)  +  Z{^).  Ulfimafely,  we  obfain 
fhe  plainfexf  polynomial  a  G  ¥2[X]/ F{X),  where  each  coefficienl  in  a  is  ob- 
fained  as  fhe  XOR  of  bifs  0,  r  —  1,  and  r  of  fhe  corresponding  coefficienl  in  Z. 

Working  modulo  2’'+^.  Since  we  are  only  interested  in  fhe  confenfs  of  bif  posi- 

fions  0,  r  —  1,  and  r  in  fhe  polynomial  Z,  we  can  compute  Z  modulo  2^~^^  rafher 

—1  _  1 

fhan  over  fhe  infegers.  Observing  fhaf  when  qi  =  2"^  +  1  fhen  =2^ 
(mod  2"^^^),  our  simplified  decryption  of  a  cipherlexl  vector  c  =  (cq,  ci)  pro¬ 
ceeds  as  follows: 

1,  Compute  Z  ^  [(c, s)  mod  F{X)]2r+i', 

2.  Recover  fhe  0-1  plainfexf  polynomial  a  =  [Z{r)  +  Z{r  —  1)  +  X(0)]2. 

3  Basic  Homomorphic  Decryption 

To  gel  a  homomorphic  implemenlalion  of  fhe  simplified  decrypfion  formula 
from  above,  we  use  an  insfance  of  our  homomorphic  encrypfion  scheme  wilh 
underlying  plainfexf  space  Z2r+i.  Namely,  denoling  by  s  fhe  level-0  secrel-key 
and  by  qo  fhe  largesl  modulus,  a  cipherlexl  encrypling  a  G  {Z/2^+^Z)[X]/F{X) 
wilh  respecl  to  s  and  go  is  a  2-vecfor  c  over  (Z/goZ)[2f]/F(X)  such  lhal 
|[(c,  s)  mod  F{X)]qg  \  <C  go  and  [(c,  s)  mod  F{X)]qg  =  a  (mod  2’’+^). 

Recall  lhal  fhe  cipherlexl  before  boolslrapping  is  wilh  respecl  lo  secref  key  s 
and  modulus  g/,  =  2^  +  1.  In  Ihis  seclion  we  only  handle  fhe  simple  case  where 
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the  public  key  includes  an  encryption  of  each  coefficient  of  the  secret-key  s  sep¬ 
arately.  Namely,  denoting  s  =  (1,5)  and  6(X)  =  Yl'j=o  encode  for 

each  j  the  coefficient  Sj  as  the  constant  polynomial  s  j  G  (Z/2’’+iZ)[X]/F(X). 
(I.e.,  the  degree-d  polynomial  whose  free  term  is  5j  G  [—2^  -|-  1,  2'’]  and  all  the 
other  coefficients  are  zero.)  Then  for  each  j  we  include  in  the  public  key  a  ci¬ 
phertext  Cj  that  encrypts  this  constant  polynomial  5j  with  respect  to  s  and  go- 
Below  we  abuse  notations  somewhat,  using  the  same  notation  to  refer  both  to  a 
constant  polynomial  z  G  (Z/2’’Z)[X]/F(X)  and  the  free  term  of  that  polyno¬ 
mial  z  G  (Z/2’'Z). 

Computing  Z  Homomorphically.  Given  the  gi-ciphertext  c  =  (co,ci)  (that 
encrypts  a  plaintext  polynomial  a  G  ¥2[X]/ F{X)),  we  use  the  encryption 
of  s  from  the  public  key  to  compute  the  simple  decryption  formula  from  above. 
Computing  an  encryption  of  Z  =  [(c,  s)  mod  F{X)]2r+i  is  easy,  since  the  co¬ 
efficients  of  Z  are  just  affine  funcfions  (over  (Z/2'’"''^Z))  of  fhe  coefficienfs  of  5, 
which  we  can  compufe  from  fhe  encryption  of  fhe  Sj’s  in  fhe  public  key. 


3.1  Extracting  the  Top  and  Bottom  Bits 

Now  that  we  have  encryptions  of  the  coefficients  of  Z,  we  need  to  extract  the 
relevant  three  bits  in  each  of  these  coefficients  and  add  them  (modulo  2)  to  get 
encryptions  of  the  plaintext  coefficients.  In  more  details,  given  a  ciphertext  c  sat¬ 
isfying  [(c,  s)  mod  F{X)]q^  =  z  (mod  where  2:  is  some  constant  poly¬ 
nomial,  we  would  like  to  compute  another  ciphertext  c  satisfying  [(c,  s)  mod 
F{X)\qq  =  2(0)  +  z{r  —  1)  -h  z{r)  (mod  2)  (with  [(c,  s)  mod  F{X)]qq  still 
much  smaller  then  go  in  magnitude).  To  this  end,  we  describe  a  procedure  to 
compute  for  alH  =  0, 1, . . . ,  r  a  ciphertext  Cj  satisfying  [(c*,  s)  mod  F{X)]qq  = 
z{i)  (mod  2).  Clearly,  we  can  immediately  set  cq  =  c,  we  now  describe  how 
to  compute  the  other  Cj’s. 

The  basic  observation  underlying  this  procedure  is  that  modulo  a  power  of  2, 
the  second  bit  of  z  —  z"^  is  the  same  as  that  of  2,  but  the  LSB  is  zero-ed  out.  Thus 
setting  z'  =  {z  —  z'^)/2  (which  is  an  integer),  we  get  that  the  LSB  of  z'  is  the 
second  bit  of  5.  More  generally,  we  have  the  following  lemma: 


Lemma  2.  Let  z  be  an  integer  with  binary  representation  z  = 
(l©f 

Define  wq  =  5,  and  for  i  >  1  define 


def 
Wi  = 


2  —  X]j=0  ^ 


3 

2* 


(division  by  2*  over  the  rationals). 


(1) 


Then  the  wfis  are  integers  and  we  have  Wi{f))  =  z{i)  for  all  i. 
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Proof.  The  lemma  elearly  holds  for  i  =  0.  Now  fix  some  i  >  1,  assume  that 
the  lemma  holds  for  all  j  <  i,  and  we  prove  that  it  holds  also  for  i.  It  is  easy  to 
show  by  induetion  that  for  any  integer  u  and  all  j  <  r  we  have 


mod  2^^^  =  u(0)  +  for  some  integer  t. 


Namely,  the  LSB  of  mod  is  the  same  as  the  LSB  of  u,  and  the  next  j 
bits  are  all  zero.  This  means  that  the  bit  representation  of  Vj  =  mod 

2"^^^  has  bits  0, 1, . . .  ,y  —  1  all  zero  (due  to  the  multiplieation  by  2^),  then 
Vj{j)  =  Wjf))  =  z{j)  (by  the  induetion  hypothesis),  and  the  next  i  —  j  bits  are 
again  zero  (by  the  observation  above).  In  other  words,  the  lowest  i  +  1  bits  of 
Vj  are  all  zero,  exeept  the  j’th  bit  whieh  is  equal  to  the  j’th  bit  of  z. 

This  means  that  the  lowest  i  bits  of  the  sum  same  as  the 

lowest  i  bits  of  z,  and  the  t  +  1  ’st  bit  of  the  sum  is  zero.  Henee  the  lowest  i  bits 
of  2:  —  X]}=o  zero,  and  the  i  +  I’st  bit  is  z{i).  Henee  2;  —  X^}=o  vj 

is  divisible  by  2*  (over  the  integers),  and  the  lowest  bit  of  the  result  is  z(i),  as 
needed.  □ 


Our  proeedure  for  eomputing  the  eiphertexts  Cj  mirrors  Lemma  2.  Speeiti- 
eally,  we  are  given  the  eiphertext  c  =  cq  that  enerypts  z  =  wq  mod  2^+^,  and  we 
iteratively  eompute  eiphertexts  ci,  £2, . . .  sueh  that  Cj  enerypts  Wi  mod  2'’“*+^. 
Eventually  we  get  that  enerypts  Wr  mod  2,  whieh  is  what  we  need  (sinee  the 
LSB  of  Wr  is  the  r’th  bit  of  z). 

Note  that  most  of  the  operations  in  Lemma  2  are  earried  out  in  (Z/2^+^Z), 
and  therefore  ean  be  evaluated  homomorphieally  in  our  (Z/2’’'''^Z)-homomorphie 
eryptosystem.  The  only  exeeption  is  the  division  by  2*  in  Equation  (1),  and  we 
now  show  how  this  division  ean  also  be  evaluated  homomorphieally.  To  im¬ 
plement  division  we  begin  with  an  arbitrary  eiphertext  veetor  c  that  enerypts 
a  plaintext  element  a  G  (Z/2-^Z)[X]/F(2f)  (for  some  j)  with  respeet  to  the 
level-0  key  s  and  modulus  go-  Namely,  we  have  the  equality  over  Z[2f]: 


((c,  s)  mod  F{X))  =  a  +  2F  S  +  qq  ■  T 


for  some  polynomials  5,  T  G  X[X]/F{X),  where  the  norm  of  a  -|-  2^ S  is  mueh 
smaller  than  go-  Assuming  that  a  is  divisible  by  2  over  the  integers  (i.e.,  all  its 
eoeftieients  are  even)  eonsider  what  happens  when  we  multiply  c  by  the  integer 
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(go  +  1) /2  (which  is  the  inverse  of  2  modulo  go)-  Then  we  have 


(^20^ -0,8^  mod  F(X))  =  20^  •  ((c,s)  modF(X)) 

^  (go  +  1)  •  g  (go  +  1)  •  2^  •  5*  go  •  (go  +  1)  •  T 
~  9.  9  9 


=  (go  +  1)  •  (a/2)  +  (go  +  1)  •  2^-^S  +  go  ■  ^  ■  T 
=  al2  +  22-1  -S  +  qQ-  (a/2  +  22-I5  +  22^T) 


Clearly  the  coefficients  of  a/2  +  22“  1 5  are  half  the  size  of  those  of  a  +  2^  S, 
hence  they  are  much  smaller  than  go-  It  follows  that  c'  =  [c  •  (go  +  l)/2]go  is 
a  valid  ciphertext  that  encrypts  the  plaintext  a/2  G  i/L/2^~^'L)[X]/ F{X)  with 
respect  to  secret  key  s  and  modulus  go- 

The  same  argument  shows  that  if  a  is  divisible  by  2*  over  the  integers  (for 
some  i  <  j)  then  [c  •  ((go  +  l)/2)*]gp  is  a  valid  ciphertext  encrypting  a/2*  G 
(Z/22-*Z)[X]/F(X).  Combining  this  division-by-two  procedure  with  homo¬ 
morphic  exponentiation  mod  2’’'''i,  the  resulting  homomorphic  bit-extraction 
procedure  is  described  in  Figure  1 . 


Bit-Extraction  (c,  r,qo): 

Input:  A  ciphertext  c  encrypting  a  constant  b  €  (Z/2’'^*Z)  w.r.t.  secret  key  s  and  modulus  go- 
Output:  A  ciphertext  c'  encrypting  6{0)  ©  b(r  —  1)  ©  6(r)  G  F2  w.r.t.  secret  key  s  and  modulus  go 

1.  Set  Co  ^  c  He  encrypt  2;  w.r.t.  s 

2.  For  i  =  1  to  r 

3.  Set  acc  ^  c  //  acc  is  an  accumulator 

4.  For  j  =  0  to  i  —  1  //  Compute  z  — 

5.  Set  tmp  ^  HomExpfcj,  2*“^)  //  Homomorphic  exponentiation  to  the  power  2*“ 

6.  Set  acc  ^  acc  —  2^  -  tmp  mod  go 

7.  Set  Ci  ^  acc  -  ((go  +  l)/2)*  mod  go  //  c;  encrypts  z{i) 

8.  Output  Co  +  Cr-i  +  Cr  mod  go 

HomExp(c,  n)  uses  native  homomorphic  multiplication  to  multiply  c  by  itself  n  times.  To  aid  ex¬ 
position,  this  code  assumes  that  the  modulus  and  secret  key  remain  fixed,  else  modulus-switching 
and  key-switching  should  be  added  (and  the  level  increased  correspondingly  to  some  i  >  Of 

Fig.  1.  A  Homomorphic  Bit-Extraction  Procedure. 


3.2  Packing  the  Coefficients 

Now  that  we  have  eneryption  of  all  the  eoeffieients  of  a,  we  just  need  to  “paek” 
all  these  eoeffieients  baek  in  one  polynomial.  Namely,  we  have  eneryption  of 
the  eonstant  polynomials  oq,  ai,  -  -  -,  and  we  want  to  get  an  eneryption  of  the 
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polynomial  a{X)  =  Yli  aiX'^.  Since  a  is  just  a  linear  combination  of  the  a^’s 
(with  the  coefficient  of  each  ai  being  the  “scalar”  X®  G  (Z/2Z)[X]/c^m)?  we 
can  just  use  the  additive  homomorphism  of  the  cryptosystem  to  compute  an 
encryption  of  a  from  the  encryptions  of  the  Uj’s. 

3.3  Lower-Degree  Bit  Extraction 

As  described  in  Figure  1,  extracting  the  r’th  bit  requires  computing  polynomials 
of  degree  upto  2®',  here  we  describe  a  simple  trick  to  lower  this  degree.  Recall 
our  simplified  decryption  process:  we  set  Z  <—  [(c,  s)  mod  <^m{X)]2r+i,  and 
then  recover  a  =  [Z{r)  +  Z{r  —  1)  +  Z{{))\2. 

Consider  what  happens  if  we  add  qi  to  all  the  odd  coefficients  in  c,  call  the 
resulting  vector  c':  On  one  hand,  now  all  the  coefficients  of  c'  are  even.  On  the 
other  hand,  the  coefficients  of  Z'  =  (c',  s)  mod  <Prn{X)  are  still  small  enough 
to  use  Lemma  1  (since  they  are  at  most  Cm  -  q  -  ||s||i  larger  than  those  of  Z  itself, 
where  Cm  is  the  ring  constant  of  mod-(Prn{X)  arithmetic  and  ||s||i  is  the  fi-norm 
of  s).  Since  c'  =  c  (mod  ql)  then  we  have 

[[(c,s)  mod  <Pm{X)]q^]2  =  [[(c',  s)  mod  <Pm{X)]q^]2  =  Z' {r)+Z' {r)-l+Z' (0) 

However,  since  c'  is  even  then  so  is  Z'.  This  means  that  Z'{0)  =  0,  and  if  we 
divide  Z'  by  two  (over  the  integers),  Z”  =  Z'/2,  then  we  have  [[(c,s)  mod 
0m{X)]q^]2  =  Z"(r  —  1)  ©  Z”{r  —  2).  We  thus  have  a  variation  of  the  simple 
decryption  formula  that  only  needs  to  extract  the  r  —  I’st  and  r  —  2’nd  bits, 
so  it  can  be  realized  using  polynomials  of  degree  upto  2®"“^.  Note  that  we  can 
implement  this  variant  of  the  decryption  formula  homomorphically,  because  Z' 
is  even  so  an  go-encryption  of  Z'  can  be  easily  converted  into  an  encryption  of 
Z'/2  (by  multiplying  by  2°^  modulo  go  ns  described  in  Section  3.1). 

This  technique  can  be  pushed  a  little  further,  adding  to  c  multiples  of  g  so 
that  it  is  divisible  by  4,  8,  16,  etc.,  and  reducing  the  required  degree  correspond¬ 
ingly  to  2®”“^,  2®”“^,  2®”“^,  etc.  The  limiting  factor  is  that  we  must  maintain  that 
(c',  s)  has  coefficients  sufficiently  smaller  than  g£,  in  order  to  be  able  to  use 
Lemma  1 .  Clearly,  if  c'  =  c  +  gK  where  all  the  coefficients  of  k  are  smaller  than 
some  bound  B  (in  absolute  value),  then  the  coefficients  of  (c®,  s)  can  be  larger 
than  the  coefficients  of  Z  =  (c,  s)  (in  absolute  value)  by  at  most  Cm  ■  q  ■  B  ■  \\s\\i. 
(Heuristically  we  expect  the  difference  to  depend  on  the  I2  norm  of  s  more  than 
its  li  norm.) 

If  we  choose  our  parameters  such  that  the  /i-norm  of  s  is  below  m,  and 
work  over  a  ring  with  Cm  =  0(1),  then  the  coefficients  of  Z  can  be  made 
as  small  as  Cm  ■  m  ■  q,  and  we  can  make  the  coefficients  of  k  as  large  as 
B  PS  q({Acm  ■  m)  in  absolute  value  while  maintaining  the  invariant  that  the 
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coefficients  of  Z'  are  smaller  than  g^/4  (which  is  what  we  need  to  be  able  to  use 
Lemma  1).  By  choosing  an  appropriate  k,  we  can  ensure  that  the  least  signifi¬ 
cant  [log(g/(4cmm))J  =  r  —  [log(4cmm)]  bits  of  c'  are  all  zero.  This  means 
that  we  can  implement  bit  extraction  using  only  polynomials  of  degree  at  most 
2riog(4cmm)]  ^  8cmm  =  0{m).  (Heuristically,  we  should  even  be  able  to  get 
polynomials  of  degree  0{y/m)  since  the  I2  norm  of  s  is  only  0{y/m).)  More¬ 
over  if  we  assume  that  ring-LWE  is  hard  even  with  a  very  sparse  secret,  then  we 
can  use  a  secret  key  with  even  smaller  norm  and  get  the  same  reduction  in  the 
degree  of  the  bit-extraction  routine. 

4  Homomorphic  Decryption  with  Packed  Ciphertexts 

The  homomorphic  decryption  procedure  from  Section  3  is  rather  inefficient, 
mostly  because  we  need  to  repeat  the  bit-extraction  procedure  from  Figure  1  for 
each  coefficient  separately.  Instead,  we  would  like  to  pack  many  coefficients  in 
one  ciphertext  and  extract  the  top  bits  of  all  of  them  together.  To  this  end  we 
employ  a  batching  technique,  similar  to  [1, 11, 14],  using  Chinese  remainder¬ 
ing  over  the  ring  of  polynomials  to  pack  many  “plaintext  slots”  inside  a  single 
plaintext  polynomial. 

Recall  that  the  BGV  scheme  is  defined  over  a  polynomial  ring  R  =  'L[X]/F{X). 
If  the  polynomial  F{X)  factors  modulo  two  into  distinct  irreducible  polynomi¬ 
als  To(X)  X  •  •  •  X  F£_i{X),  then,  by  the  Chinese  Remainder  Theorem,  the 
plaintext  space  factors  into  a  product  of  finite  fields  R2  =  '¥2[X]/ Fq{X)  x 
•••  xF2[X]/F^_i(X). 

This  factorization  is  used  in  [14, 1, 11]  to  “pack”  a  vector  of  I  elements 
(one  from  each  ¥2[X]/ Fi{X))  into  one  plaintext  polynomial,  which  is  then  en¬ 
crypted  in  one  ciphertext;  each  of  the  £  components  called  a  plaintext  slot.  The 
homomorphic  operations  (add/mult)  are  then  applied  to  the  different  slots  in 
a  SIMD  fashion.  When  F{X)  is  the  m-th  cyclotomic  polynomial,  F{X)  = 
<Pm{X),  then  the  field  Q[X]/F(X)  is  Galois  (indeed  Abelian)  and  so  the  poly¬ 
nomials  Fi{X)  all  have  the  same  degree  (which  we  will  denote  by  d).  It  was 
shown  in  [11]  how  to  evaluate  homomorphically  the  application  of  the  Galois 
group  on  the  slots,  and  in  particular  this  enables  homomorphically  performing 
arbitrary  permutations  on  the  vector  of  slots  in  time  quasi-linear  in  m.  This,  in 
turn,  is  used  in  [11]  to  evaluate  arbitrary  arithmetic  circuits  (of  average  width 
t7(A))  with  overhead  only  polylog  (A). 

However,  the  prior  work  only  mentions  the  case  of  plaintext  spaces  taken 
modulo  a  prime  (in  our  case  two),  i.e.  i?2-  In  this  work  we  will  need  to  also  con¬ 
sider  plaintext  spaces  which  are  given  by  a  power  of  a  prime,  i.e.  i?2^+i  lor  some 
positive  integer  r.  (We  stress  that  by  i?2’’  +1  wereally  domean  (Z/2*Z)[X]/F(X) 
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and  not  F2r-+i  [X]/F{X).)  In  the  full  version  [10]  we  show  how  the  teehniques 
from  [11]  extends  also  to  this  case.  The  “high  brow”  way  of  seeing  this  is  to 
consider  the  message  space  modulo  2’’+^  as  the  precision  r  +  1  approximation 
to  the  2-adic  integers;  namely  we  need  to  consider  the  localization  of  the  field 
K  =  Q[X]/F(X)  at  the  prime  2. 

4.1  Using  SIMD  Techniques  for  Bootstrapping 

Using  the  techniques  from  [11]  for  bootstrapping  is  not  quite  straightforward, 
however.  The  main  difficulty  is  that  the  input  and  output  of  are  not  presented  in  a 
packed  form:  The  input  is  a  single  q/, -ciphertext  that  encrypts  a  single  plaintext 
polynomial  a  (which  may  or  may  not  have  many  plaintext  elements  packed  in  its 
slots),  and  similarly  the  output  needs  to  be  a  single  ciphertext  that  encrypts  the 
same  polynomial  a,  but  with  respect  to  a  larger  modulus.  (We  stress  that  this  is 
not  an  artifact  of  our  “simpler  decryption  formula”,  we  would  need  to  overcome 
the  same  difficulty  also  if  we  tried  to  use  these  “SIMD  techniques”  to  speed 
up  bootstrapping  under  the  standard  approach  of  emulating  the  binary  mod-qL 
circuit.)  Our  “packed  bootstrapping”  procedure  consists  of  the  following  steps: 

1.  Using  the  encryption  of  the  qi-secret-key  with  respect  to  the  modulus  go, 
we  convert  the  initial  qi-ciphertext  into  a  qo-ciphertext  encrypting  the  poly¬ 
nomial  z  e  {z/2^+^z)[x]/<Pm{x). 

2.  Next  we  apply  a  homomorphic  inverse-DFT  transformation  to  get  encryp¬ 
tion  of  polynomials  that  have  the  coefficients  of  Z  in  their  plaintext  slots. 

3.  Now  that  we  have  the  coefficients  of  Z  in  the  plaintext  slots,  we  apply  the 
bit  extraction  procedure  to  all  these  slots  in  parallel.  The  result  is  encryption 
of  polynomials  that  have  the  coefficients  of  a  in  their  plaintext  slots. 

4.  Finally,  we  apply  a  homomorphic  DFT  transformation  to  get  back  a  cipher- 
text  that  encrypts  the  polynomial  a  itself. 

Below  we  describe  each  of  these  steps  in  more  detail.  We  note  that  the  main 
challenge  is  to  get  an  efficient  implementation  of  Steps  2  and  4. 

4.2  Encrypting  the  Qi-Secret-Key 

As  in  Section  3,  we  use  an  encryption  scheme  with  underlying  plaintext  space 
modulo  2"^^^  to  encrypt  the  qi-secret-key  s  under  the  qo'Secret-key  s.  The  ql- 
secret-key  is  a  vector  s  =  (l,s),  where  s  G  'L[X]/<l>m{X)  is  an  integer  poly¬ 
nomial  with  small  coefficients.  Viewing  these  small  coefficients  as  elements  in 
Z/2''+^Z,  we  encrypt  5  as  a  qo -ciphertext  c  =  (cq,  Ci)  with  respect  to  the  q^- 
secret-key  s  =  (1,  s),  namely  we  have 

[(c,s)  mod  ^m\qo  =  [co  +  Ci-s  mod  ^m]go  =  (equality  overZ[A]) 
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for  some  polynomial  k  with  small  coefficients. 

4.3  Step  One:  Computing  Z  Homomorphically 

Given  a  -ciphertext  c  =  (co,ci)  we  recall  from  the  public  key  the  qo  ci¬ 
phertext  c  =  (co,  Cl)  that  encrypts  5,  then  compute  the  mod-2’'+^  inner  product 
homomorphically  by  setting 

z  =  (  [co  +  ClCo  mod  [cici  mod  )•  (2) 

We  claim  that  z  is  a  go'Ciphertext  encrypting  our  Z  with  respect  to  the  secret 
key  s  (and  plaintext  space  modulo  2^+^).  To  see  that,  recall  that  we  have  the 
following  two  equalities  over  h[X], 

(cq-I-ciS  mod  •Pm)  =  2^~^^k  +  Z  and  (cq  +  CiS  mod  Pm)  =  qok  +  2^~^^k' +5, 

where  k,k,k'  G  Z[X]/Pm,  the  coefficients  of  2^+^/c  -|-  Z  are  smaller  than 
‘2Ql  ^  coefficients  of  +  5  are  also  much  smaller  than  go- 

It  follows  that: 

((z,  s)  mod  Pm)  =  [c'o  -h  Cl  Co  mod  Pm]qo  +  (s  •  [ciCi  mod  Pm\qo  mod  Pm) 
=  (c'o  -h  ci(co  +  Cis)  mod  Pm)  +  qo^ 

=  (c'o  -I-  Ci(2’'+^fc'  -I-  s)  mod  Pm)  +  Qok 
=  (c'o  -I-  ciS  mod  Pm)  +  qoK  -|-  2’'+^(ci  •  k'  mod  Pm) 

=  q^K  -h  2'^^^{k  -h  cik'  mod  Pm)  +  Z  (equality  over  Z[X]) 

for  some  k,  k'  G  7i[X]/Pm-  Moreover,  since  the  coefficients  of  ci  are  smaller 
than  qi  <C  qo  then  the  coefficients  of  2’'+^(/c  -|-  ci^'  mod  Pm)  +  Z  are  still 
much  smaller  than  go-  Hence  z  is  decrypted  under  s  and  qo  to  Z,  with  plaintext 
space  2’'+t. 

4.4  Step  Two:  Switching  to  CRT  Representation 

Now  that  we  have  an  encryption  of  the  polynomial  Z,  we  want  to  perform  the 
homomorphic  bit-extraction  procedure  from  Figure  1 .  However,  this  procedure 
should  be  applied  to  each  coefficient  of  Z  separately,  which  is  not  directly 
supported  by  the  native  homomorphism  of  our  cryptosystem.  (For  example, 
homomorphically  squaring  the  ciphertext  yields  an  encryption  of  the  polyno¬ 
mial  Z^  mod  Pm  rather  than  squaring  each  coefficient  of  Z  separately.)  We 
therefore  need  to  convert  z  to  CRT-based  “packed”  ciphertexts  that  hold  the 
coefficients  of  Z  in  their  plaintext  slots. 
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The  system  parameter  m  was  ehosen  so  that  m  =  0{X)  and  <Pm{X)  faetors 
modulo  2  (and  therefore  also  modulo  2^+^)  as  a  produet  of  degree-d  polynomi¬ 
als  with  d  =  0(log  m),  <Pm{X)  =  0^=0  ^ji^)  (mod  2’’+^).  This  allows  us  to 
view  the  plaintext  polynomial  Z{X)  as  having  £  slots,  with  the  j’th  slot  holding 
the  value  Z{X)  mod  (Fj(X),  2^+^).  This  way,  adding/multipliying/squaring 
the  plaintext  polynomials  has  the  effeet  of  applying  the  same  operation  on  eaeh 
of  the  slots  separately. 

In  our  ease,  we  have  (j){m)  eoeffieients  of  Z{X)  that  we  want  to  put  in  the 
plaintext  slots,  and  eaeh  eiphertext  has  only  £  =  (l){m)/d  slots,  so  we  need  d 
eiphertexts  to  holds  them  all.  The  transformation  from  the  single  eiphertext  z 
that  enerypts  Z  itself  to  the  eolleetion  of  d  eiphertexts  that  hold  the  eoeffieients 
of  Z  in  their  slots  is  deseribed  in  Seetion  4.7  below.  (We  deseribe  that  step  last, 
sinee  it  is  the  most  eomplieated  and  it  builds  on  maehinery  that  we  develop  for 
Step  Four  in  Seetion  4.6.) 

4.5  Step  Three:  Extracting  the  Relevant  Bits 

Onee  we  have  the  eoeffieients  of  Z  in  the  plaintext  slots,  we  ean  just  repeat 
the  proeedure  from  Figure  1.  The  input  to  the  the  bit-extraetion  proeedure  is 
a  eolleetion  of  some  d  eiphertexts,  eaeh  of  them  holding  £  =  (j){m)/d  of  the 
eoeffieients  of  Z  in  its  £  plaintext  slots.  (Reeall  that  we  ehose  m  =  0(A)  sueh 
that  d  =  0(log  m).)  Applying  the  proeedure  from  Figure  1  to  these  eiphertexts 
will  implieitly  apply  the  bit  extraetion  of  Lemma  2  to  eaeh  plaintext  slot,  thus 
leaving  us  with  a  eolleetion  of  d  eiphertexts,  eaeh  holding  £  of  the  eoeffieients 
of  a  in  its  plaintext  slots. 

4.6  Step  Four:  Switching  Back  to  Coefficient  Representation 

To  finally  eomplefe  fhe  reerypfion  process,  we  need  fo  converf  fhe  d  cipher- 
lexis  holding  fhe  coefficienls  of  a  in  Iheir  plainlexl  slols  info  a  single  cipher- 
lexl  lhal  encrypls  fhe  polynomial  a  ilself.  For  Ihis  Iransformalion,  we  appeal 
lo  fhe  resulf  of  Genlry  el  al.  [11],  which  says  lhal  every  deplh-L  circuil  of 
average-widlh  f7(A)  and  size  T  can  be  evalualed  homomorphically  in  time 
0(T)  •  poly(L,  log  A),  provided  that  the  inputs  and  outputs  are  presented  in 
a  packed  form.  Below  we  show  that  the  transformation  we  seek  can  be  com¬ 
puted  on  cleartext  by  a  circuit  of  size  T  =  0{m)  and  depth  L  =  polylog(m), 
and  hence  (since  m  =  (9(A))  it  can  be  evaluated  homomorphically  in  time 
0(m)  =  0(A). 

To  use  the  result  of  Gentry  et  al.  we  must  first  reconcile  an  apparent  “type 
mismatch”:  that  result  requires  that  both  input  and  output  be  presented  in  a 
packed  CRT  form,  whereas  we  have  input  in  CRT  form  but  output  in  coefficient 
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form.  We  therefore  must  interpret  the  output  as  “something  in  CRT  representa¬ 
tion”  before  we  ean  use  the  result  from  [11].  The  solution  is  obvious:  sinee  we 
want  the  output  to  be  a  in  eoeffieient  representation,  then  it  is  a  polynomial  that 
holds  the  value  Aj  =  a  mod  Fj  in  the  y ’th  slot  for  all  j. 

Henee  the  transformation  that  we  wish  to  eompute  takes  as  input  the  eo- 
effieients  of  the  polynomials  a{X),  and  produees  as  output  the  polynomials 
Aj  =  a  mod  Fj  for  j  =  0, 1,  —  1.  It  is  important  to  note  that  our  output 

eonsists  of  £  values,  eaeh  of  them  a  degree-d  binary  polynomial.  Sinee  this  out¬ 
put  is  produeed  by  an  arithmetie  eireuit,  then  we  need  a  eireuit  that  operates  on 
degree-d  binary  polynomials,  in  other  words  an  arithmetie  eireuit  over  GF(2^). 
This  eireuit  has  i  ■  d  inputs  (all  of  whieh  happen  to  be  elements  of  the  base  field 
F2),  and  i  outputs  that  belong  to  the  extension  field  GF(2'^). 

Theorem  1.  Fix  m  ^  h,  let  d  ^  h  be  the  smallest  such  that  m|2'^  —  1,  de¬ 
note  I  =  (j){m)ld  and  let  G  G  F2[X]  be  a  degree-d  irreducible  polynomial 
over  F2  (that  fixes  a  particular  representation  of  G¥{2'^)).  Let  Fq{X),  Fi{X), 

. . . ,  F£_i{X)  be  the  irreducible  (degree-d)  factors  of  the  m-th  cyclotomic  poly¬ 
nomial  <Lm{X)  modulo  2. 

Then  there  is  an  arithmetic  circuit  IJm  over  ¥2[X]/G(X)  =  GF(2'^)  with 
(f){m)  inputs  oq,  ai, . . . ,  ond  i  outputs  zq,  zi, . . . ,  Z£-i,  for  which  the 

following  conditions  hold: 

—  When  the  inputs  are  from  the  base  field  (a,  G  F2  Vi  j  and  we  denote  a{X)  = 

G  F2[X],  then  the  outputs  satisfy  Zj  =  a(X)  mod  (Fj(X),2)  G 
¥2[X]/G{X). 

-  Ilm  has  depth  0(log  m)  and  size  0(m  log  m). 

The  proof  is  in  fhe  full  version.  An  immediate  eorollary  of  Theorem  1  and  fhe 
Genfry  el  al.  resull  [11,  Thm.  3],  we  have: 

Corollary  2.  There  is  an  efficient  procedure  that  given  d  ciphertexts,  encrypting 
d  polynomials  that  hold  the  coefficients  of  a  in  their  slots,  computes  a  single 
ciphertext  encrypting  a.  The  procedure  works  in  time  0(m)  ■  poly log(m)  (and 
uses  at  most  polylog{m)  levels  of  homomorphic  evaluation). 

4.7  Details  of  Step  Two 

The  transformation  of  Step  Two  is  roughly  the  inverse  of  the  transformation 
that  we  deseribed  above  for  Step  Four,  with  some  added  eomplieations.  In  this 
step,  we  have  the  polynomial  Z(X)  over  the  ring  Z/2®'+^Z,  and  we  view  it  as 

defining  £  plaintext  slots  with  the  y ’th  slot  eontaining  Bj  Z  mod  {Fj,2^~^^). 
Note  that  the  Bj ’s  are  degree-d  polynomials,  and  we  eonsider  them  as  elements 
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in  the  “extension  ring”  R^r+i  Z[X]/(G(X),  2’'+^)  (where  G  is  some  fixed 
irredueible  degree-d  polynomial  modulo  2^+^). 

Analogous  to  Theorem  1,  we  would  like  to  argue  that  there  is  an  arithmetic 
circuit  over  Rfr+i  that  get  as  input  the  Sj’s  (as  elements  of  and  outputs 

all  the  coefficients  of  Z  (which  are  elements  of  the  base  ring  Then 

we  could  apply  again  to  the  result  of  Gentry  et  al.  [11]  to  conclude  that  this 
circuit  can  be  evaluated  homomorphically  with  only  polylog  overhead. 

For  the  current  step,  however,  the  arithmetic  circuit  would  contain  not  only 
addition  and  multiplication  gates,  but  also  Frobenius  map  gates.  Namely,  gates 
/Ofc(')  (for  A:G{l,2,...,d  —  1})  computing  the  functions 

Pfc(n(X))  =  «(x2'‘)mod(G(X),2"+i). 

It  was  shown  in  [11]  that  arithmetic  circuits  with  Frobenius  map  gates  can  also 
be  evaluated  homomorphically  with  only  polylog  overhead.  The  Frobenius  oper¬ 
ations  being  simply  an  additional  automorphism  operation  which  can  be  applied 
homomorphically  to  ciphertexts. 

Theorem  2.  Fix  m,r  ^  h,  let  d  ^  h  be  the  smallest  such  that  m\2^  —  1,  denote 
I  =  ([){m) / d  and  let  G{X)  be  a  degree-d  irreducible  polynomial  over 
(that  fixes  a  particular  representation  ofR^r+i)-  Let  Fo{X),  Fi{X),  . . . ,  F£_i{X) 
be  the  irreducible  (degree-d)  factors  of  the  m-th  cyclotomic  polynomial  <!>m{X) 
modulo  2’’+^. 

Then  there  is  an  arithmetic  circuit  with  Frobenius-map  gates  over 
Rfr+i  that  has  £  input  Bg^i  and  ^{m)  outputs  Zq,  Zi, . . . , 

for  which  the  following  conditions  hold: 

-  On  any  inputs  Bo, . . . ,  B^-i  G  R^+i,  the  outputs  of\Frn,r  cire  all  in  the  base 
ring,  Zi  G  Z/2'’+^Z  Vi.  Moreover,  denoting  Z(X)  =  Yl^ZiX^,  it  holds 
that  Z(X)  mod  (Fj(X),2^+^)  =  Bj  for  all  j. 

-  Ilm  has  depth  0(log  m  +  d)  and  size  0(m(d  +  log  m)). 

The  proof  is  in  the  full  version.  As  before,  a  corollary  of  Theorem  2  and  the 
result  from  [11],  is  the  following: 

Corollary  3.  There  is  an  efficient  procedure  that  given  a  single  ciphertext  en¬ 
crypting  Z'  outputs  d  ciphertexts  encrypting  d  polynomials  that  hold  the  coef¬ 
ficients  of  Z'  in  their  plaintext  slots.  The  procedure  works  in  time  0(m)  (and 
uses  at  most  polylog(m)  levels  of  homomorphic  evaluation). 
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4.8  An  Alternative  Variant 

The  procedure  from  Section  4.7  works  in  time  0{m),  but  it  is  still  quite  expen¬ 
sive.  One  alternative  is  to  put  in  the  public  key  not  just  one  ciphertext  encrypting 
the  qi -secret-key  5,  but  rather  d  ciphertexts  encrypting  polynomials  that  hold  the 
coefficients  of  s  in  their  plaintext  slots.  Then,  rather  than  using  the  simple  for¬ 
mula  from  Equation  (2)  above,  we  evaluate  homomorphically  the  inner  product 
of  s  =  (1,  s)  and  c  =  (cq,  ci)  modulo  and  2^+^.  This  procedure  will  be 

even  faster  if  instead  of  the  coefficients  of  s  we  encrypt  their  transformed  image 
under  length-m  DFT.  Then  we  can  compute  the  DFT  of  ci  (in  the  clear),  multi¬ 
ply  it  homomorphically  by  the  encrypted  transformed  s  (in  SIMD  fashion)  and 
then  homomorphically  compute  the  inverse-DFT  and  the  reduction  modulo  <Prn- 
Unfortunately  this  procedure  still  requires  that  we  compute  the  reduction  mod- 
homomorphically,  which  is  likely  to  be  the  most  complicated  part  of 
bootstrapping.  Finding  a  method  that  does  not  require  this  homomorphic  poly¬ 
nomial  modular  reduction  is  an  interesting  open  problem. 
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Abstract 

BGV-Style  homomorphic  encryption  schemes  over  polynomial  rings,  rely  for  their  security  on  rings 
of  very  large  dimension.  This  large  dimension  is  needed  because  of  the  large  modulus-to-noise  ratio  in 
the  key-switching  matrices  that  are  used  for  the  top  few  levels  of  the  evaluated  circuit.  However,  larger 
noise  (and  hence  smaller  modulus-to-noise  ratio)  is  used  in  lower  levels  of  the  circuit,  so  from  a  secu¬ 
rity  standpoint  it  is  permissible  to  switch  to  lower-dimension  rings,  thus  speeding  up  the  homomorphic 
operations  for  the  lower  levels  of  the  circuit.  However,  implementing  such  ring-switching  is  nontrivial, 
since  these  schemes  rely  on  the  ring  algebraic  structure  for  their  homomorphic  properties. 

A  basic  ring-switching  operation  was  used  by  Brakerski,  Gentry  and  Vaikuntanathan,  over  polyno¬ 
mial  rings  of  the  form  Z[A]/(A^  +  1),  in  the  context  of  bootstrapping.  In  this  work  we  generalize  and 
extend  this  technique  to  work  over  any  cyclotomic  ring  and  show  how  it  can  be  used  not  only  for  boot¬ 
strapping  but  also  during  the  computation  itself  (in  conjunction  with  the  “packed  ciphertext”  techniques 
of  Gentry,  Halevi  and  Smart.) 


Note:  A  later  version  of  this  work,  with  a  substantially  different  transformation,  appears  in  SCN  2012. 
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1  Introduction 


The  last  year  has  seen  a  rapid  advance  in  the  state  of  fully  homomorphic  encryption;  yet  despite  these 
advances  the  existing  schemes  are  still  too  inefficient  for  most  practical  purposes.  In  this  paper  we  make 
another  step  forward  in  making  such  schemes  more  efficient.  In  particular  we  present  a  technique  to  reduces 
the  dimension  of  the  ring  needed  for  homomorphic  computation  of  the  lower  levels  of  a  circuit.  Our  tech¬ 
niques  apply  to  homomorphic  encryption  schemes  over  polynomial  rings,  such  as  the  scheme  of  Brakerski 
et  al.  [6,  7,  5],  as  well  as  the  variants  due  to  Lopez-Alt  et  al.  [15]  and  Brakerski  [4]. 

The  most  efficient  variants  of  all  these  schemes  work  over  polynomial  rings  of  the  form  Z[A]/F(X), 
and  in  all  of  them  the  ring  dimension  (which  is  the  degree  of  F{X))  must  be  set  high  enough  to  ensure  secu¬ 
rity:  To  be  able  to  handle  depth-L  circuits,  these  schemes  must  use  key-switching  matrices  with  modulus-to- 
noise  ratio  of  hence  the  ring  dimension  must  also  be  i7(L  •  polylog(A))  (even  if  we  assume 

that  ring-LWE  is  hard  to  within  fully  exponential  factors).^  In  practice,  the  ring  dimension  for  moderately 
deep  circuits  can  easily  be  many  thousands.  For  example,  to  be  able  to  evaluate  AES  homomorphically. 
Gentry  et  al.  used  in  [14]  circuits  of  depth  L  >  50,  with  corresponding  ring-dimension  of  over  50000. 

As  homomorphic  operations  proceed,  the  noise  in  the  ciphertext  grows  (or  the  modulus  shrinks,  if  we 
use  the  modulus-switching  technique  from  [7,  5]),  hence  reducing  the  modulus-to-noise  ratio.  Consequently, 
it  becomes  permissible  to  start  using  lower-dimension  rings  in  order  to  speed  up  further  homomorphic 
computation.  However,  in  the  middle  of  the  computation  we  already  have  evaluated  ciphertexts  over  the 
big  ring,  and  so  we  need  a  method  for  transforming  these  into  small-ring  ciphertexts  that  encrypt  the  same 
thing.  Such  a  “ring  switching”  procedure  was  described  by  Brakerski  et  al.  [5],  in  the  context  of  reducing 
the  ciphertext-size  prior  to  bootstrapping.  The  procedure  in  [5],  however,  is  specific  fo  polynomial  rings 
of  fhe  form  7?2"  =  Z[Af]/(A^"  +  1)?  and  moreover  by  ilself  if  cannof  be  combined  wifh  fhe  “packed 

evaluation”  fechniques  of  Genfry  el  al.  [12].  Extending  fhis  procedure  is  fhe  focus  of  fhis  work. 

1.1  Our  Contribution 

In  fhis  work  we  presenf  fwo  complemenfary  fechniques: 

•  We  extend  fhe  procedure  from  [5]  lo  any  cyclolomic  ring  Rm  =  for  a  composite  m. 

This  is  imporlanl,  since  fhe  fools  from  [12]  for  working  wifh  “packed”  cipherfexls  require  fhaf  we 
work  wifh  an  odd  parameter  m.  For  m  =  u  ■  w,  we.  show  how  lo  break  a  cipherlexl  over  fhe  big 
ring  Rm  info  a  collection  of  n  =  m/w  cipherfexls  over  fhe  smaller  ring  R^  =  Z[X]/4>u,(2f),  such 
lhal  fhe  plainlexl-polynomial  encrypled  in  fhe  original  big-ring  cipherlexl  can  be  recovered  as  a  simple 
linear  function  of  fhe  plainfexl-polynomials  encrypled  in  fhe  smaller-ring  cipherfexls. 

•  We  Ihen  show  how  lo  lake  a  “packed”  big-ring  cipherlexl  lhal  conlains  many  plainlexl  elemenls  in  ils 
plainlexl  slols,  and  dislribule  Ihese  plainlexl  elemenls  among  Ihe  plainlexl  slols  of  several  small-ring 
cipherfexls.  If  Ihe  original  big-ring  cipherlexl  was  “sparse”  (i.e.,  if  only  few  of  ils  plainlexl  slols  were 
used),  Ihen  our  technique  yields  jusl  a  small  number  of  small-ring  cipherfexls,  only  as  many  as  needed 
lo  til  all  Ihe  used  plainlexl  slols. 

The  firsl  technique  on  ils  own  may  be  useful  in  Ihe  conlexl  of  boolslrapping,  bul  il  is  nol  enough  lo 
achieve  our  goal  of  reducing  Ihe  compulalional  overhead  by  swilching  lo  small-ring  cipherfexls,  since  we 

*The  schemes  from  [5,  4]  can  replace  large  rings  by  using  higher-dimension  vectors  over  smaller  rings.  But  their  most  efficient 
variants  use  big  rings  and  low-dimension  vectors,  since  the  complexity  of  their  key-switching  step  is  quadratic  in  the  dimension  of 
these  vectors. 
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still  need  to  show  how  to  perform  homomorphie  operations  on  the  resulting  small-ring  eiphertexts.  This  is 
aehieved  by  utilizing  the  seeond  teehnique.  To  demonstrate  the  usefulness  of  the  seeond  teehnique,  eonsider 
the  applieation  of  homomorphie  AES  eomputation  [14],  where  the  original  big-ring  eiphertext  eontains  only 
16  plaintext  elements  (eorresponding  to  the  16  bytes  of  the  AES  state).  If  the  small-ring  eiphertexts  has  16  or 
more  plaintext  slots,  then  we  ean  eonvert  the  original  big-ring  eiphertext  into  a  single  small-ring  eiphertext 
eontaining  the  same  16  bytes  in  its  slots,  then  eontinue  the  eomputation  on  this  smaller  eiphertext. 

1.2  An  Overview  of  the  Construction 

Our  starting  point  is  the  polynomial  eomposition  teehnique  of  Brakerski  et  al.  [5].  When  m  =  u  ■  w  then 
a  polynomial  of  degree  up  to  m  —  1,  a{X)  =  o,iX^,  ean  be  broken  into  u  polynomials  of  degree  up 

to  m  —  1  by  splitting  the  eoeffieients  of  a  aeeording  to  their  index  modulo  u.  Namely,  denoting  by  the 
polynomial  with  eoeffieients  a^,  Ufc+u,  a,k+2u,  •  •  •,  we  have 


u—lw—1  u—1  w—1  u—1 

(1) 

k=0  j=0  k=0  j=0  k=0 

We  note  that  this  “very  syntaetie”  transformation  (of  splitting  the  eoeffieients  of  a  big-ring  polynomial  into 
several  small-ring  polynomials)  has  the  following  erueial  algebraie  properties: 

1.  The  end  result  is  a  eolleetion  of  “parts”  all  from  the  small  ring  (whieh  is  a  sub-ring  of  the 
big  ring  Rm,  sinee  w\m). 

2.  Reealling  that  f{x)  f{x^)  is  an  embedding  of  Rw  inside  Rm,  we  have  the  property  that  the 
original  a  ean  be  reeovered  as  a  simple  linear  eombination  of  (the  embedding  of)  the  parts  a(fc). 

3.  Moreover  the  transformation  T(a)  =  (a(o))  •  •  •  >  «(«-!))  is  linear,  and  as  sueh  it  eommutes  with  the 
linear  operations  inside  the  deeryption  formula  of  BGV-type  sehemes:  If  s  is  a  big-ring  seeret  key  and 
c  is  (part  of)  a  big-ring  eiphertext,  then  deeryption  over  the  big  ring  ineludes  eomputing  a  =  s-c  £  Rm 
(and  later  redueing  a  mod  q  and  mod  2).  Due  to  linearity,  the  parts  of  a  ean  be  expressed  in  terms  of 
the  tensor  produet  between  the  parts  of  s  and  c.  Namely,  T(s  •  c)  is  some  linear  funetion  (over  the 
small  ring  2?^)  of  T(s)  (g)  r(c). 

In  addition  to  these  algebraie  properties,  in  the  ease  eonsidered  in  [5]  where  m,  w  are  powers  of  two,  it  turns 
out  that  this  transformation  also  possess  the  following  geometrie  property: 

4.  If  a  is  a  low-norm  element  in  Rm,  then  all  the  parts  in  T(a)  are  low-norm  elements  in  R^. 

The  importanee  of  this  last  property  stems  from  the  faet  that  a  valid  eiphertext  in  a  BGV-type  homomorphie 
eneryption  seheme  must  have  a  low  noise,  namely  its  inner-produet  with  the  unknown  seeret  key  must  be  a 
low-norm  polynomial.  Property  3  above  is  used  to  eonvert  a  big-ring  eiphertext  enerypting  a  (relative  to  a 
big-ring  seeret  key  s)  into  a  eolleetion  of  “syntaetieally  eorreet”  small-ring  eiphertexts  enerypting  the  a(fc)’s 
(relative  to  the  small-ring  seeret  key  T(s)),  and  Property  4  is  used  to  argue  that  these  small-ring  eiphertexts 
are  indeed  valid. 

When  attempting  to  apply  the  same  transformation  for  m,  w  that  are  not  powers  of  two,  it  turns  out  that 
the  algebraie  properties  must  all  still  hold,  but  the  geometrie  property  may  not.  One  plausible  solution  is 
to  find  a  different  transformation  T(-)  for  breaking  a  big -ring  element  into  a  veetor  of  small-ring  elements, 
that  has  all  the  properties  1-4  above,  even  when  m,  w  are  not  powers  of  two.  In  the  eurrent  work,  however, 
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we  stick  to  the  same  transformation  T(-)  as  in  [5],  and  address  the  problem  with  the  geometric  property  by 
“lifting”  everything  from  the  big  ring  Rm  =  'L[X]/^m{X)  to  the  even  bigger  ring  Cm  =  Z[X]/(X"^  —  1), 
using  techniques  similar  to  [12,  9]. 

The  reason  that  lifting  to  Cm  helps,  is  that  over  the  bigger  ring  Cm,  the  linear  combination  from  Equa¬ 
tion  (1)  is  in  fact  a  “direct  sum”,  in  the  sense  that  every  coefficient  of  the  left-hand  side  comes  from  exactly 
one  of  the  terms  on  the  right.  Thus  if  the  result  is  a  low-norm  polynomial  then  all  the  summands  must  also 
be  low-norm  polynomials,  which  is  what  we  need.^ 

A  Key-Switching  Optimization.  One  source  of  inefficiency  in  fhe  ring-swifching  procedure  of  Brakerski 
ef  al.  [5]  is  fhaf  using  fhe  fensor  producf  T(s)  ®T{c)  amounfs  essentially  fo  having  u  small-ring  cipherfexls, 
each  of  which  is  a  dimension-rt  vecfor  over  fhe  small  ring.  Brakerski  el  al.  poinf  oul  fhaf  we  can  use  key- 
swilching/dimension-reduclion  lo  converl  fhese  high  dimension  cipherfexls  info  low-dimension  cipherfexls 
over  fhe  small  ring,  bul  processing  u  cipherfexls  of  dimension  u  requires  work  quadralic  in  u.  Instead,  here 
we  describe  an  alternative  procedure  fhaf  saves  a  factor  of  u  in  running  time: 

Before  using  T(-)  lo  break  fhe  cipherlexl  info  pieces,  we  apply  key-swifching  over  fhe  big  ring  lo  gel 
a  cipherlexl  wilh  respecl  to  anolher  secrel  key  lhal  happens  to  belong  to  Ihe  small  ring  (which  we  note 
again  is  a  sub-ring  of  Rm)-  The  Iransformalion  T(-)  has  Ihe  additional  properly  lhal  when  applied  to  a  small¬ 
ring  elemenl  s'  G  R^  C  Rm,  the  resulting  vector  T(s')  over  R^  has  jusl  a  single  non- zero  elemenl  (namely 
s'  ilself).  Hence  T(s')  (8)  T{c)  is  Ihe  same  as  jusl  s'  •  T(c),  and  Ihis  lels  us  work  direclly  wilh  low-dimension 
cipherfexls  over  Ihe  small  ring  (as  opposed  to  cipherlexls  of  dimension  u).  This  is  described  in  Section  3.1, 
where  we  prove  lhal  key-swilching  into  a  key  from  Ihe  small  subring  is  secure  as  long  as  ring-LWE  [16]  is 
hard  in  lhal  small  subring. 

Packed  Ciphertexts.  As  sketched  so  far,  the  ring-switching  procedure  lets  us  convert  a  big-ring  ciphertext 
encrypting  a  polynomial  a  G  Rm  into  a  collection  of  u'  small-ring  ciphertexts  encrypting  the  parts  a(fc)  G 
Rw  However,  coming  in  the  middle  of  homomorphic  evaluation,  we  may  need  to  get  small-ring  ciphertexts 
encrypting  things  other  than  the  a(fc)’s.  Specifically,  if  the  original  polynomial  a  encodes  several  plaintext 
elements  in  its  plaintext  slots  (as  in  [19,  12]),  we  may  want  to  get  encryption  of  small-ring  polynomials  that 
encode  the  same  elements  in  their  slots. 

We  note  that  the  plaintext  elements  encoded  in  the  polynomial  a  G  Rm  are  the  evaluations  a{Ci)  where 
the  (jj’s  are  primitive  m-th  roots  of  unity  in  some  extension  field  F2<i.  (Equivalently,  the  evaluations  a(Cj) 
can  also  be  described  as  a  mod  p*,  where  pi  is  some  prime  ideal  in  the  ring  Rm  —  specifically  the  ideal 
generated  by  {2,  X  —  ^j}.  Noting  that  these  prime  ideals  are  exactly  the  factors  of  2  in  Rm,  this  evaluation 
representation  over  CF{2'^)  is  nothing  more  than  Chinese-Remaindering  over  the  prime  factors  of  2  in  Rm-) 

Similarly,  the  plaintext  elements  encoded  in  a  polynomial  b  G  Rw  are  the  evaluations  b{Tj)  with  the 
Tj ’s  are  primitive  ru-th  roots  of  unity  (equivalently  the  residues  of  b  relative  to  the  prime  ideal  factors  of  2 
in  Rw)-  Our  goal,  then,  is  to  decompose  a  big-ring  ciphertext  encrypting  a  into  small-ring  ciphertexts 
encrypting  some  bt’s,  such  that  for  every  i  there  are  some  t,j  for  which  bt{Tj)  =  a{C,i). 

To  that  end,  we  interpret  Equation  (1)  as  expressing  the  value  of  a  at  an  arbitrary  point  X  as  a  linear 
combination  of  the  values  of  the  a(fc)’s  at  the  point  X“  (with  coefficients  1,X,  . . . ,  X““^).  Observing 

that  if  (  in  an  m-th  root  of  unity  then  r  =  is  a  m-th  root  of  unity,  we  thus  obtain  a  method  of  expressing 
the  values  of  a  in  the  m-th  roots  of  unity  as  linear  combinations  of  the  values  of  the  a(fc)’s  in  the  m-th 
roots  of  unity.  In  Eemma  6  in  Section  4  we  show  how  to  express,  under  some  conditions  on  m  and  m,  the 

^In  the  power-of-two  setting  considered  in  [5],  the  same  “direct  sum”  argument  can  be  applied  directly  in  the  big  ring  it2", 
hence  they  do  not  need  the  “lifting”  technique. 
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coefficients  of  the  linear  combination  from  Equation  (1)  as  (low  norm)  polynomials  in  the  tj’s.  This  allows 
us  to  compute  the  encryption  of  the  bt’s  that  we  seek  as  low-weight  linear  combination  of  the  encryption  of 
the  a(fc)’s  that  we  obtained  before. 

A  bird-eye  view  of  this  last  transformation  is  that  the  linear  transformation  T(a)  that  we  used  to  break 
the  plaintext  big-ring  element  into  a  vector  of  small-ring  parts  has  the  side-effect  of  inducing  some  linear 
transformation  (over  F2d)  on  the  contents  of  the  plaintext  slots.  Hence  after  we  apply  T,  we  compute 
homomorphically  the  inverse  linear  transformation,  thereby  recovering  the  original  content. 

2  Notation  and  Preliminaries 

Below  we  define  the  various  algebraic  structures  that  we  need  for  this  work.  In  this  paper  we  will  be  utilizing 
various  rings  at  different  points,  all  will  be  associated  to  rings  of  roots  of  unity.  Below  let  m,  q  be  arbitrary 
positive  integers.  Let  ^rn{X)  denote  the  m’th  cyclotomic  polynomial  (i.e.,  ^rn{X)  =  nie(z/mZ)*("^  “ 
where  Cm  is  the  complex  primitive  m’th  root  of  unity.  Cm  =  Recalling  that  is  an  integer 

polynomial,  we  define  the  following  rings: 

i?m  =  'L[X]/^m{X),  Cm  =  'L[X]/{X^-l) 

Rm,q  =  Z[X]/{^m{X),q),  Cm,q  =  Z[X]  /  {X^  -  1 ,  q) 

We  will  be  interested  in  cyclotomic  rings  for  a  composite  m  =  u  ■  w. 

The  size  of  polynomials.  Throughout  this  work  we  frequently  refer  to  “low  norm  polynomials”.  The 
norm  that  we  use  to  measure  the  size  of  polynomials  is  the  I2  norm  of  their  coefficient  vectors,  i.e.  for  a 

polynomial  /  we  set  norm(/)  =  ff-  (Most  of  our  treatment  is  not  very  sensitive  to  the  choice  of  the 
particular  norm  function.)  We  informally  say  that  a  polynomial  in  Rm,q  or  Cm,q  has  low  norm  when  its 
norm  is  much  smaller  than  the  parameter  q. 

The  ring  constant  c^.  We  sometime  need  to  switch  back  and  forth  between  Rm,q  and  Cm,q  while  main¬ 
taining  “low  norm”  polynomials.  For  every  integer  m  there  exists  a  constant  Cm  that  bounds  the  increase  in 
norm  due  to  reduction  modulo  ^m{X).  Namely,  for  every  polynomial  /  of  degree  up  to  m  —  1  is  holds  that 
norm(/  mod  <  Cm  •  norm(/). 

Empirically,  the  constants  Cm  for  the  parameters  m  that  we  work  with  is  rather  small  (ranging  between 
2  and  50  for  typical  values).  But  in  principle  for  very  smooth  m’s  the  constant  Cm  can  be  super-polynomial 
in  m.  For  the  rest  of  the  paper  we  always  assume  that  our  parameters  are  chosen  so  that  q  Cm,  so  that 
we  can  take  “low  norm”  polynomials  in  Cm  and  reduce  them  modulo  ^m  without  increasing  the  norm  too 
much  (relative  to  q).  Note  that  ring  constant  Cm  is  different,  but  related  to,  the  associated  ring  constant  from 
[8,  12]. 

2.1  RLWE-based  BGV  Cryptosystems 

Below  and  throughout  this  work  we  denote  by  [z\q  the  reduction  of  the  integer  z  modulo  the  positive  integer 
q  into  the  symmetric  interval  {—ql2,  q/2).  In  our  initial  ring-LWE-based  BGV  cryptosystem,  secret  keys 
and  ciphertexts  are  2-vectors  over  Rm,q  for  some  odd  system  parameter  q,  and  moreover  the  secret  key  has 
the  form  s  =  (l,s)  where  s  G  Rm  is  a  low-norm  polynomial  (e.g.,  with  coefficients  in  {—1,  0, 1}).  The 
native  plaintext  space  for  our  initial  BGV  scheme  will  be  Rm,2,  namely  binary  polynomials  modulo  ^m{X). 
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A  valid  ciphertext  c  =  (cq,  ci)  G  {Rm,qf‘  that  encrypts  the  plaintext  polynomial  a  G  Rm,2  with  respect  to 
s  =  (1,5)  satisfies  the  equality  (over  Rm) 

[(c,s)]g  =  [co  +  s-ci]g  =  a  +  2e,  (2) 

for  some  low-norm  polynomial  e  G  Rm-  Note  that  by  [cq +5  •  ci]q  we  mean  reducing  each  of  the  coefficients 
of  the  polynomial  cq  +  s  •  ci  G  Rm  into  the  interval  {—qj2,qj2).  Decryption  is  then  just  computing 
[co  +  s  •  ci]q,  then  reducing  modulo  2  to  recover  the  plaintext  polynomial  a. 

Throughout  the  paper  we  will  switch  back  and  forth  between  different  rings.  We  will  maintain  the 
invariant  that  valid  ciphertexts  always  satisfy  Equation  (2),  but  the  ring  over  which  this  equation  is  evaluated 
(specifically  fhe  meaning  of  s  •  ci)  will  vary.  In  fhe  inpuf  fo  fhe  ring-swifching  procedure  we  will  have 
a  cipherfexf  where  fhaf  equalify  holds  over  Rm,  af  fhe  end  we  will  have  fhe  oufpuf  cipherfexfs  for  which 
fhe  equably  holds  over  and  in  various  inlermediafe  poinls  we  will  have  fhaf  equality  holding  over  Cm 
or  Cu,. 

2.2  Plaintext  Arithmetic 

Following  [19,  5,  12,  13,  14]  we  consider  plainlexl  polynomials  a  G  Rm,2  as  encoding  vectors  of  plaintext 
elements  from  some  finite  field  F2<i,  where  d  is  the  order  of  2  in  the  group  (Z/mZ)*.  (This  implies  that 
d  divides  and  also  that  F2d  contains  primitive  m-th  roots  of  unity.)  Denoting  £  =  (j){m)/d,  we 

can  identify  polynomials  in  Rm,2  with  ^-vectors  of  elements  from  F2<i.  The  specific  mapping  between 
polynomials  and  vectors  that  we  use  is  as  follows: 

Consider  the  quotient  group  (Z/mZ)*/  (2)  (which  has  exactly  £  elements),  and  fix  a  specific  set  of 
representatives  for  this  quotient  group,  Tm  =  {fi,  ty,  •  •  • ,  fr}  C  (Z/mZ)*,  containing  exactly  one  element 
from  every  conjugacy  class  in  (J^IrnL)*  j  {2)?  Also  fix  a  specific  primitive  m-th  root  of  unity  G  F2d,  and 
we  identify  each  polynomial  a  G  Rm,2  with  the  Avector  consisting  of  a(C*)  for  all  t  G  Tm- 

aGRm,2  ^  (a(C'i),...,a(C*0)  e 

Showing  that  this  is  indeed  a  one-to-one  mapping  is  a  standard  exercise.  In  one  direction  clearly  from  a  we 
can  compute  all  the  values  a(C*0-  the  other  direction  we  use  the  fact  that  since  the  coefficients  of  a  are 
all  in  the  base  field  F2  then  a(X^)  =  a{X)‘^  for  any  X  G  F2d.  In  particular  from  a(C*0  we  can  compute 
a(C^**)’  so  on.  Since  Tm  is  a  complete  set  of  representatives  for  the  quotient  group 

[T^IrnL)* !  (2),  then  we  can  get  this  way  the  evaluations  of  a{C,^)  for  all  the  indexes  j  G  (Z/mZ)*.  This 
gives  us  the  evaluation  of  a  in  (/)(m)  different  points,  from  which  we  can  interpolate  a  itself. 

We  thus  view  the  evaluation  of  the  plaintext  polynomial  in  as  the  j’th  “plaintext  slot”,  and  note 
that  arithmetic  operations  in  the  ring  Rm,2  act  on  the  plaintext  slots  in  a  SIMD  manner,  namely  point-wise 
adding  or  multiplying  the  elements  in  the  slots. 

We  can  equivalently  view  this  mapping  as  Chinese  remaindernig  represntation  (which  makes  the  one- 
to-one  argument  and  the  SIMD  property  obvious,  but  requires  careful  choises  for  the  represenation  of  F2d 
in  the  different  plaintext  slots). 

2.3  Breaking  Polynomials  in  Parts 

As  sketched  in  the  introduction,  our  approach  is  rooted  at  the  technique  for  assembling  a  high-degree  poly¬ 
nomial  from  low-degree  parts  by  interleaving  the  coefficients  of  the  parts.  Alternatively,  we  can  view  this 

^In  other  words,  the  sets  Tm,  2Tm,  4Tm,  -  -  -  2^~^Tm  are  all  disjoint,  and  their  union  is  the  entire  group  i£LlmX)* . 
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as  breaking  a  high-degree  polynomial  into  small-degree  parts.  Reeall  that  for  a  polynomial  a  of  degree  up 
to  m  —  1,  and  for  any  integer  u  <  m,  we  ean  break  a  into  u  parts  of  degree  less  than  w  =  \m/u\,  denoted 
0(0))  •  •  • )  O(u-i))  by  splitting  the  eoeffieients  of  a  aeeording  to  their  index  mod  u,  thus  obtaining 

U—lw—1  U—1  /W—1  \  u—1 

“W  =  E  E  =  E-^‘'  E  O^k+uj  '  X  ^^X  •  ). 

k=0  j=0  k=0  j=0  ^  k=0 

Of  partieular  interest  to  us  will  be  the  ease  where  m  =  u  ■  w,  where  working  with  m-degree  polynomials 
that  are  evaluated  at  X'^  allows  us  to  move  between  big  rings  and  small  rings.  The  following  lemma  will  be 
useful  later  in  the  paper. 

Lemma  1.  Let  m,  w  be  positive  integers  such  that  w  divides  m,  and  let  u  =  m/w.  Also  let  ^rn{X),  <1>^(X) 
be  the  m-th  and  w-th  cyclotomic  polynomials,  respectively. 

a.  Consider  three  polynomials  f{X),g{X),  h{X)  of  degree  at  most  fiyo)  —  1.  Ifh{X)  =  f{X)  ■  g{X) 
(mod  ^^(X))  then  h{X^)  =  /(X“)  •  g{X^)  (mod  ^^(X)). 

b.  Consider  three  polynomials  f{X),g(X),h{X)  of  degree  at  most  w  —  1.  If  h{X)  =  /(X)  •  g{X) 
(mod  X^  -  1)  then  /i(X“)  =  /(X“)  •  p(X“)  (mod  X”^  -  1). 

Proof,  a.  Sinee  h{X)  =  /(X)  •  g{X)  (mod  ‘1>^(X))  then  for  every  primitive  ru-th  root  of  unity  r  (say, 
over  the  eomplex  field)  we  have  /i(r)  =  /(r)  •  ^(r).  Let  us  denote  /(X)  =  /(X“)  mod  ^m{X),  g{X)  = 
5((X“)  mod  $m(X),  and  h{X)  =  /i(X“)  mod  ‘hm(X),  then  for  every  primitive  m-th  root  of  unity  (  we 
have 

/(c) -5(0  =  fin- 9(0  =  KC)  =  Ho 

where  the  equality  (*)  follows  sinee  is  a  primitive  ru-th  of  unity  whenever  is  a  primitive  m-th  of  unity. 
Sinee  /  •  g  has  the  same  evaluations  as  h  on  all  the  primitive  m-th  roots  of  unity  then  it  follows  that  f  -g  =  h 
(mod  as  needed. 

b.  The  proof  is  identieal  to  Part  a,  exeept  that  we  eonsider  all  m-th  and  m-th  roots  of  unity,  not  just  the 
primitive  roots.  □ 


3  The  Basic  Ring-Switching  Procedure 

Given  a  big-ring  eiphertext  c  G  iRm,qO,  enerypting  a  plaintext  polynomial  a  G  Rm,2  relative  to  a  big-ring 
seeret  key  s  G  Rm,  our  goal  is  roughly  to  eome  up  with  u  small-ring  eiphertexts  cq,  ci, . . . ,  Cu-i  G  {Rw^qO 
with  Ci  enerypting  the  part  a(j)  G  Rw,2,  all  relative  to  some  small  ring  seeret  key  s'  G  Rw  The  basie 
procedure  consists  of  the  following  steps: 

1.  Key-switch.  We  use  the  BGV  key-switching  method  from  [5]  to  switch  into  a  “low-dimension” 
secret  key,  still  over  the  big  ring  Rm,q-  The  “low-dimension”  key  is  s"  G  Rm,  where  s"  has  nonzero 
coefficients  only  for  powers  X*  where  f  =  0  (mod  u) .  That  is,  we  have  s'^'q^  =  s'  and  s'^(^  =  0  for  all 
i  >  0  (in  other  words  s"(X)  =  s'(X“)). 

2.  Lift.  Next  we  lift  the  resulting  ciphertext  from  the  big  ring  Rm,q  to  the  even  bigger  ring  Cm,q,  using 
the  delayed-reduction  technique  of  Gentry  et  al.  [12].  As  described  in  Section  3.2,  the  new  ciphertext 
encrypts  over  the  bigger  ring  Cm,q  a  plaintext  polynomial  a'  related  to  a,  still  relative  to  the  big-ring 
secret  key  s". 
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Figure  1:  The  transformation  used  to  map  elements  between  the  different  spaees. 


3.  Break.  Now  we  ean  break  the  bigger-ring  eiphertext  into  a  eolleetion  of  u  intermediate -ring  eipher- 
texts  (i.e.,  pairs  over  Cw^q),  sueh  that  the  /c’th  eiphertext  is  a  valid  eneryption  of  the  /c’th  part  of  a' 
(i.e.,  G  C'u>,2)-  All  these  eiphertexts  are  valid  (over  Cw^q)  with  respeet  to  the  small-ring  seeret 
key  s'. 

4.  Reduce.  Finally  we  reduee  all  the  intermediate  ring  eiphertexts  modulo  q),  thereby  getting 

small  ring  eiphertexts  over  Rw,q,  valid  relative  to  s'. 

We  observe  that  the  small  ring  eiphertext  that  we  get  this  way  may  not  enerypt  the  parts  of  the  original 
polynomial  a.  Rather,  we  will  show  that  they  enerypt  some  other  polynomials  d^,  whieh  are  defined  as 
dfc  =  mod  (4>^t,,2).  We  will  show,  however,  that  these  plaintext  polynomials  dfc  satisfy  the  same 

relation  to  the  original  plaintext  polynomial,  namely  a{X)  =  Yhk  (mod  2),  whieh  is  all 

we  need  for  our  applieation. 

3.1  Switching  to  a  Low-Dimension  Key 

To  enable  this  transformation,  we  inelude  in  the  publie  key  a  “key  switehing  matrix”,  essentially  enerypting 
the  old  key  s  under  the  new  low-dimension  key  s".  Note  that  using  sueh  a  low-dimension  seeret  key  has 
seeurity  implieations  (sinee  it  severely  reduees  the  dimension  of  the  underlying  LWE  problem).  In  our  ease, 
however,  the  whole  point  of  switehing  to  a  smaller  ring  is  to  get  lower  dimension,  so  we  do  not  saeritiee 
anything  new.  Indeed,  we  show  below  that  assuming  the  hardness  of  the  deeision-ring-LWE  problem  [16] 
over  the  ring  Ru,,q,  the  key-switehing  matrix  in  the  publie  key  is  indistinguishable  from  a  uniformly  random 
matrix  over  Rm,q  (even  for  a  distinguisher  that  knows  the  old  seeret  key  s). 

The  ring-LWE  problem  in  Rw,q’  We  denote  the  seeret-key  and  error-distributions  preseribed  in  the  ring- 
LWE  problem  in  Rw,q  by  and  S^,  respeetively.  (E.g.,  these  eould  be  low-varianee  Gaussians  in  Ry^ 
rounded  modulo  q,  or  some  distributions  involving  the  dual  as  in  [16].)  We  also  denote  the  uniform  distri¬ 
bution  on  Rw,q  by  Uu,.  Eor  a  fixed  random  seeret  s'  <—  S^,  the  ring-EWE  problem  in  Rw,q  is  given  many 
pairs  (7j,  5i)  with  7*  to  distinguish  the  eases  where  the  di’s  are  ehosen  as  (5*  =  s'  •  7*  -|-  r]i  with  r]i 

from  the  ease  where  they  are  ehosen  uniformly  at  random  5i  <—  Uw 
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The  key-switching  matrix.  Let  5  G  Rm  be  the  old  big-ring  seeret  key,  and  s'  G  R^j  be  the  small-ring 
seeret-key  that  we  want  to  switeh  into  (where  s'  was  ehosen  from  the  seeret-key  distribution  Su,)-  Define 
the  new  big-ring  low-dimension  key  s"  G  Rm  as  the  unique  polynomial  of  degree  less  than  m  sueh  that 
sfo)  ~  ^'(k)  ~  ^  /c  >  0.  In  other  words,  5"{X)  =  s(X“),  i.e.,  the  eoeffieients  s'q',s'^,s'2'„,  . . . 

are  exaetly  the  eoeffieients  of  s',  and  all  the  other  eoeffieients  of  s"  are  zero. 

For  our  key-switehing  matrix  we  use  the  following  distribution  of  “error  veetors”  in  Rw,q-  We  first 
draw  independently  at  random  u  low-norm  polynomials  from  the  ring-LWE  error  distribution,  <—  Sw, 
then  assemble  from  the  ??(fc)’s  a  single  error  polynomial  e'{X)  =  and  output  e  = 

e'  mod  (‘hm,  q)-  That  is,  we  have  the  distribution 

{u-l 

ri(o),  ■■■,  V{u-i)  ^  £w,  output  ^  •  r/(fc)(X“)  mod  {^m,  q) 

k=0 

Note  that  e'  before  the  reduetion  mod  {^m,  q)  has  degree  smaller  than  (j){w)  ■  u  <  m,  and  its  norm-squared 
is  the  sum  of  norm-squared  of  the  e(fc)’s.  Henee  e'  is  a  low-norm  polynomial,  and  the  norm  of  e  after 
the  reduetion  is  larger  by  at  most  a  faetor  of  Cm  {Cm  is  the  ring  eonstant  for  Rm),  so  e  too  is  a  low  norm 
polynomial.'^ 

Given  the  old  key  s  G  Rm,q  and  the  new  s'  G  Rw,q,  we  draw  at  random  I  =  [log  q)  elements  from  the 
error  distribution  eo, . . . ,  <—  Sm,  and  the  eolumns  of  our  key-switehing  matrix  are  the  pairs 

{{Pi,ai)^  :  ai  ^  Um,  Pi  =  2's  -  (s"  •  -f  2ej)  mod  (<f>m,g)}  , 

where  Um  is  the  uniform  distribution  over  the  big  ring  Rm,q-  (Note  that  even  if  the  seeret-key  and  error 
distributions  over  the  small  ring  involee  the  “dual  lattiee”  as  in  [16],  the  P’s  are  still  going  to  be  in  the  big 
ring,  beeause  all  their  parts  are  in  the  small  ring.)^ 

Sinee  the  errors  e*  have  low-norm,  this  is  a  funetional  key-switehing  matrix,  as  deseribed  in  [7].  Given 
an  s-eiphertext  c  =  (cq,  c\)  we  deeompose  ci  into  its  bit  representation,  thus  getting  an  f-veetor  of  polyno¬ 
mials  with  0-1  eoeffieients.  Multiplying  that  veetor  by  the  key-switehing  matrix  and  adding  cq  to  the  first 
eoordinate  we  get  a  new  eiphertext  c'  =  (c'q,  c'^)  with  respeet  to  the  new  low-dimension  big-ring  key  s".  As 
for  seeurity,  we  prove  the  following  lemma. 

Lemma  2.  If  the  decision  ring-LWE  problem  over  the  ring  Rw,q  A  hard,  then  the  key-switching  matrix  above 
is  indistinguishable  from  a  uniformly  random  2  x  I  matrix  with  all  the  entries  drawn  independently  from  Um- 
The  indistinguishability  holds  even  if  the  distinguisher  gets  as  input  the  old  secret  key  s  G  Rm- 

Proof  Our  goal  is  to  show  that  under  the  hardness  of  ring-LWE  in  R^,  it  is  infeasible  to  distinguish  the  ease 
where  the  Pfs  where  ehosen  as  preseribed  in  the  seheme  from  the  ease  where  they  are  uniformly  random 
aeeording  to  Um-  That  is,  we  show  that  an  adversary  A  that  given  the  old  seeret  key  s  and  the  matrix  of 
{Pi,  ai)  ’s  ean  distinguish  between  these  two  distributions,  ean  be  used  to  solve  the  ring-LWE  problem  in  the 
small  ring  Rw,q- 

The  reduction.  A  ring-LWE  distinguisher  B  gets  I  ■  u  pairs  {yi^k,  ^i,k),  fori  =  0, 1, —  1  and  k  = 
0, 1, . . .  —  1,  where  the  7i^fc’s  are  uniform  in  Rw,q  and  the  are  either  set  as  s'  •  7*^^  -|-  pi^k,  for 

"'This  argument  can  be  refined  to  eliminate  the  dependence  on  the  “smallness”  of  Cm,  see  Remark  1  at  the  end  of  the  section. 
^We  could  alternatively  use  the  key-switching  variant  from  [14]  where  the  “matrix”  consists  of  a  single  column  (/f,  af),  but 
with  respect  to  a  largest  modulus  Q  ~  (f  -  m.  The  proof  of  security  would  then  depend  on  the  hardness  of  ring-LWE  in  Rw,q 
rather  than  in  R^.q- 
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r]  <—  £w,  or  chosen  at  random  (5j  ^  <—  Uw-  B  begins  by  choosing  an  “old  secret  key”  in  the  big  ring  s  G  Rm,q 
(according  to  whatever  distribution  the  scheme  specifies).  Then  B  assembles  the  a^’s  and  Pi’s  by  setting 

U—1  U—1 

apX)  =  ^od  and  A(X)  =  2' •  s  -  2  •  ^  mod  g). 

k=0  k=0 

Finally,  B  runs  the  adversary  ^  on  s  and  the  matrix  with  columns  {Pi,aiY  and  outputs  whatever  A  does. 

Analysis.  We  observe  that  when  we  have  polynomials  /o, /i, . . . , /^-i  G  Rw,q  and  we  set  g{X)  = 
X^/fc(X“)  mod  {^rn,q),  then  the  coefficients  of  g  are  related  to  those  of  the  fPs  via  a  {4>{w)  ■ 
u)  X  (j){m)  matrix  of  full  rank  (i.e.,  rank  (j){m))  over  Z/gZ.  When  the  /^’s  are  drawn  from  then  all  their 
coefficients  are  uniform  in  Z/gZ,  and  therefore  so  are  all  the  coefficients  of  g. 

Applying  this  observation  to  the  reduction  above,  since  the  yi^s  are  uniform  in  the  small  ring  Rw,q 
then  the  a^s  are  set  as  twice  a  uniform  element  in  the  big  ring  Rm,q,  which  is  also  uniform  since  q  is  odd. 
Similarly,  if  the  di^s  are  uniform  in  Rw,q  then  also  the  Pi’s  are  uniform  in  the  big  ring  Rm,q-  On  the  other 
hand,  if  the  are  chosen  as  5i^k  =  s'  •  yi^k  +  gi,k  mod  g),  with  gi^k  ^  £w,  then  we  have 


U—1 


Pi{X)  =  2*  •  s(X)  -  2  ^  •  <5,,fc(X“) 


k=0 

u—1 


Si  /g  evaluated  at  X'^ 


2'  •  s(X)  [(s'  •  li,k  +  g^.k)  mod  g)]  (X“) 


(*) 


fc=0 


li— 1 


no  modular  reduction 


y  2'-s(X)-2-^X'=-  [P.yi^k+m,k\  (X“) 


/c=0 


u—1 


u—1 


=  2'  •  s(X)  -  s'(X“)  •  2  •  X'^  •  -  2  ^  X'^  •  gi,k{X^)  (mod  q), 

B"(X) 


k=0 


fc=0 


oii{X) 


^i{X) 


where  the  equality  (*)  follows  from  Lemma  3  (part  a).  In  this  case  the  aPs  are  still  uniformly  random,  but 
the  eZs  are  drawn  exactly  from  our  error  distribution  £m  in  the  big  ring  Rm,q-  This  completes  the  proof.  □ 


3.2  Lifting  to  the  Bigger  Ring  Cm,q 

To  lift  the  ciphertexts  from  the  big  ring  Rm,q  to  the  bigger  ring  Cm,q,  we  use  the  “delayed  reduction” 
technique  of  Gentry  et  al.  (from  the  full  version  of  [12]),  which  builds  on  the  following  lemma: 

Lemma  3.  ([12,  Lemma  12])  For  any  integer  m  there  is  an  integer  polynomial  Gm  of  degree  <  m  —  1, 
such  that  Gm{o:)  =  mfor  every  complex  primitive  m-th  root  of  unity  a,  and  GmiP)  =  0/or  every  complex 
non-primitive  m-th  root  of  unity  p.  Moreover  the  Euclidean  norm  of  Gm 's  coefficient  vector  is  ym  •  p(m). 

Denoting  Qm{X)  =  (X"^  -  l)/^>m(X),  then  Gm{X)  =  m  (mod  ^m)  and  Gm{X)  =  0  (mod  Qm)- 
We  can  use  polynomial  Chinese  remaindering  to  construct  Gm  from  its  remainders  modulo  <l>m(X)  and 
Qm{X).  Since  Gm{X)  =  0  (mod  Qm)  then  we  can  use  Gm  to  “lift”  any  equality  modulo  ^m  to  an 
equality  modulo  X"^  —  1.  Namely,  if  we  have  f  =  g  (mod  ^m)  then  we  also  have  G  ■  f  =  G  ■  g 
(mod  X™  —  1).  Specifically  for  fhe  decrypfion  formula,  we  sfarf  from  a  valid  big -ring  cipherfexf  fhaf 
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satisfies  the  formula  cq  +  ci  •  s"  =  a  +  2e  +  qk  (mod  ^m)  (for  some  low-norm  polynomial  e  and  a  quotient 
polynomial  k),  then  multiply  both  sides  by  Gm  to  obtain 

{Gm  •  Co)  +  {Gm  ■  Cl)  •  5"  =  2{Gm  '  c)  “h  {Gm  '  o)  +  (l{Gm  '  1^)  (mod  X™'  —  1). 

Assuming  that  q  ^  m,  the  products  Gm  ■  e  mod  {X^  —  1)  and  Gm  ■  a  mod  (X”^  —  1)  are  both  low-norm. 
Thus,  denoting  Cq  =  Gm  ■  co  mod  (X"^  —  1)  and  di  =  Gm  •  ci  mod  (X™^  —  1),  we  get  that  the  ciphertext 
(cq,  di)  is  a  valid  encryption  over  the  bigger  ring  Gm  of  a'  =  Gm  ■  a  mod  (X”^  —  1,2),  relative  to  the 
secret  key  s".  (We  note  that  upon  decryption,  one  can  recover  the  original  plaintext  polynomial  a,  simply  by 
reducing  a'  modulo  (‘hm(X),  2),  this  yields  [m  ■  0)2  =  a,  because  Gm(X)  =  m  (mod  ^h^)  and  m  is  odd.) 


3.3  Breaking  The  Ciphertext  into  Parts 

After  the  transformation  of  the  previous  step,  our  ciphertext  consists  of  a  pair  (c,  d)  of  polynomials  in  the 
bigger  ring  Gm,q  =  Z[X]/((X™  —  1),  q).  This  ciphertext  is  valid  with  respect  to  the  low-dimension  secret 
key  s"  of  degree  smaller  than  (/>(m),  satisfying  =  s'  G  Rw,q  and  =  ...  =  =  0,  in 

other  words  s"(X)  =  s'(X“).  Breaking  c,  d  into  their  parts  C(fc),  we  then  have  the  following  lemma. 

Lemma  4.  The  polynomials  and  are  such  that  the  following  equality  holds  over  Z[X].' 

U—1 

[c+d-s"mod(X”^-l,q)](X)  =  ^ X'^  •  [c(fc)  +  •  s' mod  (X'"  -  1, q)]  (X“). 

fc=0 

(In  the  above  equality,  we  have  on  both  sides  polynomials  that  are  reduced  to  a  lower  degree  and  have  their 
coefficients  reduced  modulo  q,  then  evaluated  at  X  or  X“.) 

Proof.  Recall  that  decryption  over  Gm,q  calls  for  computing  z  =  c  +  d  ■  5”  mod  (X™  —  1),  then  reducing  z 
modulo  q  and  then  modulo  2.  Breaking  the  polynomials  c,  d  and  s"  into  parts,  we  can  write: 


2u-2 


(d-s")(X)=  X".d(q(X“).s'('^.)(X“) 

t. 

:k 

( 


k=0  i,j  s.t. 
i+j=k 


u—1 

A:=0 


d(,)(X“).s'('^.)(X“)+  X“.d(,)(X“).s'('^.)(X“) 


.  i,j  s.t. 
\i+j=k 


i,j  s.t. 
i-\-j=k+u 


(^) 


u—1 


u—  1 


J^X'^  •«!(,)  (X").s'('o)(X“)  =  ^X'=.d(,)(X").s'(X“) 


k=0 


k=0 


where  the  equality  (*)  follows  since  s'^'^^  =  0  for  y  >0  and  (i(j)  =  0  for  i  >  u.  This  implies  also  that 


U—1  U—1 

{c  +  d-  s")(X)  =  X"  •  C(fc)(X“)  +  X"  •  d(fc)(X“)  •  s'(X“) 
k=0  k=0 

u—1 

=  J^X".  [c(fc)+d(fc)-s'](X“) 

k=0 
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Recall  from  Lemma  3  (part  b)  that  whenever  we  have  h{X)  =  f{X)  •  g{X)  (mod  X^  —  1)  then  also 
/i(X“)  =  f{X^)  ■  g{X'^)  mod  (X™'  —  1).  Hence  we  have 


U—  1 

{c  +  d-  s")(X)  =  ■  hk)  +  dik)  ■  s'  mod  {X^  -  1)]  (X“)  (mod  X™  -  1), 

k=0 

and  since  the  right-hand  side  of  the  last  equality  is  a  polynomial  of  degree  less  than  m,  then  we  get  the 
following  equality  holding  over  Z[X]: 


u-l 

[c+d-s”  mod  {X^-l,q)]{X)  =  X"^  •  [c(fc)  +  •  s' mod  (X"' -  1,  q)]  (X“). 

fc=0 

We  note  that  in  the  above  equality,  we  have  on  both  sides  polynomials  that  are  reduced  to  a  lower  degree 
and  have  their  coefficients  reduced  modulo  q,  then  evaluated  at  X  or  X“.  However,  once  we  perform 
these  modular  reduction  on  both  sides,  then  both  polynomials  have  degrees  less  than  m  and  coefficients 
smaller  than  q/2  in  absolute  value,  and  since  they  are  congruent  modulo  ((X™  —  1),  q)  then  they  must  be 
identical.  □ 

Size  of  Polynomials.  Importantly,  the  sum  on  the  right-hand  side  of  the  last  equality  is  a  “direct  sum”, 
in  the  sense  that  the  A:’th  summand  has  non-zero  coefficients  only  in  powers  X*  such  that  i  =  k  (mod  u). 
This  means  that  each  coefficient  in  the  sum  comes  from  exactly  one  of  the  summands.  This,  in  turn,  implies 
that  the  norm-squared  of  the  left-hand  side  is  the  sum  of  norm-squared  of  the  terms  on  the  right-hand  side. 
Hence  if  the  left-hand  side  has  low  norm,  then  also  every  summand  on  the  right  must  have  low  norm. 

We  stress  that  this  “direct  sum”  argument  is  the  reason  why  we  lift  our  ciphertext  to  the  bigger  ring 
Cm,q-  This  argument  does  not  apply  when  working  modulo  thus  without  lifting  we  could  not  have  used 
the  fact  that  the  left-hand  side  has  low  norm  to  argue  that  all  the  terms  on  the  right  have  low  norm. 

Ciphertexts  in  the  intermediate  ring  C^^q.  Consider  now  the  u  intermediate-ring  ciphertexts  over  Cw,q- 

^0  (^(0) )  ^(0) ))  (^(1))^(1)))  •••)  ^u—1  (c(«— 1) )  1) )  • 

Since  the  bigger-ring  ciphertext  (c,  d)  was  a  valid  encryption  of  a'  =  Gm- a  mod  (X'”— 1,2)  over  with 
respect  to  secret  key  s",  we  know  that  we  have  [c  +  d-  s"  mod  (X”^  —  1,  q)]  =  2e'  -|-  a'  for  some  low-norm 
error  e'.  Let  us  denote  b'  =  2e'  -|-  o'.  From  the  equalities  above  (and  the  “direct  sum”  argument),  we  know 
that  the  k’th  part  of  6',  namely  6'^^^  =  2e'^^^  -|-  is  obtained  as  6'^^^  =  •  s'  mod  (X"'  —  1,  q)]. 

As  is  a  low-norm  error  term,  we  conclude  that  the  vectors  are  valid  encryption  of  the  parts  over 
Cw,q  with  respect  to  secret  key  s'.  Thus  we  have  shown  that  valid  ciphertexts  encrypting  the  parts  of  a' 
(over  the  intermediate  ring  Cw,q  with  respect  to  s')  can  be  obtained  simply  by  breaking  the  polynomials  c,  d 
into  their  parts. 

3.4  Reducing  to  the  Small  Ring  g 

Now  that  we  have  valid  ciphertext  (c(fc),(i(fc))  encrypting  the  parts  over  the  intermediate  ring  Cuj,q 
relative  to  s',  it  only  remains  to  reduce  them  into  the  small  ring  Ru,,q-  We  do  this  simply  by  reducing  each 
of  the  element  d(fc))  modulo  {^^,q),  i.e.  we  set  (d>u,,  q)  and  d^  =  d(fc)  mod  q)- 
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Lemma  5.  The  ciphertext  (c^,  dk)  is  an  encryption  (over  Rw,q)  of  the  plaintext  dk  =  mod  (<^^,2)  G 
Rw,2- 

Proof  Recall  that  for  all  k  we  have  the  equality  (over  Z[X]) 

C(fc)  +  rf(fc)  •  s'  mod  {X^  -  I,  q)  = 

for  a  low-norm  error  term  Denoting  6'^^^  =  -|-  we  have  that  6'^^^  is  a  low-norm  polynomial  in 

Cw,q- 

Let  us  now  denote  bk  =  mod  d>^)  (without  reduction  modulo  q).  Since  the  are  low-norm 

then  so  are  the  bk  (because  reduction  modulo  increases  the  norm  by  at  most  a  factor  of  the  ring  con¬ 
stant  Cw)-  This  means  that  bk  has  norm  much  smaller  than  q,  so  it  is  already  reduced  modulo  q.  In  other 
words,  we  also  have  bk  =  mod  (^u,,  q). 

Observe  that  dk  =  (o-^k)  mod  +  2  ■  pk  for  some  low-norm  pk’^-  The  pk’^  have  low  norm  because 
dk  has  low  norm  (being  a  0-1  polynomial)  and  also  (a'^k)  mod  has  low  norm  (being  at  most  c^j  time 

more  than  the  norm  of  the  0-1  polynomial  Next  we  argue  that  for  all  k  we  have  bk  =  2ek  +  dk  for  a 
low-norm  error  terms  G  Rw,q-  This  follows  because 

=  {b[k)  mod  ^>^^,)  =  (2  •  -f  mod  ^>^„)  =  (2  •  mod  ^>u,)  +  (a(fc)  mod  ^>^„) 

=  2  •  (e(^  mod  <f>^)  +  dk  -  2  ■  pk  =  2  •  mod  <f>^)  -  pk)  +  dk, 

' - V - ' 

efc 

Finally,  we  obtain: 

(cfc  +  4  •  s'  mod  (^>u,,  q))  =  (c(fc)  -f  d(^k)  '  s'  mod  (^>u,,  q)) 

=  (6'(fc)  mod  q))  =  bk  =  2- ek  +  dk 

In  other  words,  since  dk  has  low  norm  then  the  pair  (ck,  dfc)  is  a  valid  ciphertext  over  Rw,q  with  respect  to 
secret  key  s',  encrypting  the  plaintext  polynomial  dk  G  Rwp-  D 

What  are  the  a^’s?  At  this  point  we  are  done  converting  the  original  big-ring  ciphertext  encrypting  a  G 
Rm,2  into  a  collection  of  valid  small-ring  ciphertexts  encrypting  the  d^’s.  But  how  are  these  d^’s  related  to 
the  original  plaintext  polynomial  a?  Ideally  we  would  have  liked  the  dk  to  be  the  parts  of  a  (i.e.  dk  =  a(A:))> 
but  this  is  not  necessarily  what  we  get.  Still,  we  show  that  we  can  recover  the  original  polynomial  a  from 
the  dk ’s  via  the  same  assembly  formula, 

U—1 

a{X)  =  J^A'=-dfc(X4mod($^,2). 

k=0 

To  show  that  we  first  observe  that  on  both  sides  of  the  equation  are  0-1  polynomials  of  degree  less  than 
(/)(m),  so  to  demonstrate  equality  it  is  enough  to  show  that  they  agree  when  evaluated  at  (j){m)  different 
points  (from  any  field  of  our  choice).  In  particular,  we  now  show  fhaf  fhey  agree  on  all  fhe  primifive  m’lh 
roofs  of  unify  over  fhe  finife  field  F2<i.  For  Ibis  we  recall  fhe  following  basic  facls: 

1.  The  field  F2<i  confains  primifive  m’fh  roofs  of  unify,  and  if  C  £  ^2^  is  a  primifive  m’fh  roofs  of  unify 
fhen  is  a  primifive  tu’fh  roof  of  unify. 
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2.  Since  Gm  =  m  =  1  (mod  2),  then  [Gm  mod  2]((()  =  1  for  every  primitive  m’th  root  of  unity 
C  G  F2<i.  Since  a'  =  Gm  ■  a  mod  —  1,2),  it  then  follows  that  a'{C)  =  ^(C)  for  every  primitive 
m’th  root  of  unity  C  G  F2d. 

3.  Since  mod  {^W7  2),  then  dfc(r)  =  a'Q,7^{T)  for  every  primitive  te’th  root  of  unity  r  G  F2d. 

Putting  all  of  these  facts  together,  and  using  the  assembly  formula  for  a'  from  the  parts  a'^^y  we  get  for  every 
primitive  m’th  root  of  unity  C  G  F2d: 

a(C)  a'(C)  = 

k=0  k=0 

Remark  1.  If  we  use  the  delayed  reduction  technique  from  [12,  Appendix  E]  then  we  can  keep  everything 
relative  to  —  1  and  X'^  —  1  and  we  do  not  need  to  rely  on  the  smallness  of  the  ring  constants  Cm,  Cw 
The  key-switching  matrices  will  remain  modulo  however. 

4  Homomorphic  Computation  in  the  Small  Ring 

So  far  we  have  shown  how  to  break  a  big-ring  ciphertext,  encrypting  some  big-ring  polynomial  a  G  Rm,2, 
into  a  collection  of  u  small-ring  ciphertexts  encrypting  small-ring  polynomials  do,  di,  •  •  • ,  du-i  G  Rw,2,  that 
are  “related”  to  the  original  plaintext  polynomial  a.  Namely  a  can  be  constructed  as  a  particular  big-ring 
linear  combination  of  the  d^’s,  a{X)  =  X^  ■  dk{X'^)  mod  (‘hm,  2). 

This,  however,  still  falls  short  of  our  goal  of  speeding-up  homomorphic  computation  by  switching  to 
small-ring  ciphertexts.  Indeed  we  have  not  shown  how  to  use  the  encryption  of  the  d^’s  for  further  homo¬ 
morphic  computation.  Following  the  narrative  of  SIMD  homomorphic  computation  from  [19,  12,  13,  14], 
we  view  the  big-ring  plaintext  polynomial  a  as  an  encoding  in  the  big  ring  of  several  plaintext  elements 
from  the  extension  field  F2d  (with  d  the  order  of  2  in  (Z/mZ)*).  We  therefore  wish  to  obtain  small-ring 
ciphertexts  encrypting  small-ring  polynomials  that  encode  of  the  same  underlying  F2d  elements. 

One  potential  ’’algebraic  issue”  with  this  goal,  is  that  it  may  not  always  be  possible  to  embed  F2d 
elements  inside  small-ring  polynomials  from  Rw,2-  Recall  that  the  extension  degree  d  is  determined  by 
the  order  of  2  in  fLlrnlf)* .  But  the  order  of  2  in  ifLlwlf)*  may  be  smaller  than  d,  in  general  it  will  be  some 
d'  that  divides  d.  If  d'  <  d  then  we  can  only  embed  elements  of  the  sub-field  F2d'  in  small-ring  polynomials 
from  Rw^2,  and  nol  fhe  F2d  elemenfs  fhaf  we  have  encoded  in  fhe  big-ring  polynomial  a.  For  mosf  of  Ibis 
section  we  only  consider  fhe  special  case  where  fhe  order  of  2  in  bofh  (Z/mZ)*  and  (Z/reZ)*  is  fhe  same  d. 
We  discuss  possible  extensions  fo  fhe  general  case  af  fhe  end  of  fhe  section. 

Even  for  fhe  special  case  where  fhe  order  of  2  in  (Z/mZ)*  and  (Z/reZ)*  is  fhe  same  (and  hence  fhe 
“plainfexf  slofs”  in  fhe  small  ring  confain  elemenfs  from  fhe  same  exfension  field  as  fhose  in  fhe  big  ring), 
we  sfill  need  fo  fackle  fhe  issue  fhaf  big  ring  polynomials  have  more  plainfexf  slofs  fhan  small  ring  polyno¬ 
mials.  Specifically,  big-ring  polynomials  have  im  =  (l){m)/d  slofs,  whereas  small-ring  polynomials  only 
have  =  ([{w)  / d  slofs.  The  solufion  here  is  simple:  we  jusf  parfifion  fhe  slofs  in  fhe  original  big-ring  poly¬ 
nomial  a  info  iml^w  =  4>{m) / ([{w)  groups,  each  consisting  of  slofs.  For  each  group  we  fhen  consfrucf 
a  small-ring  cipherfexf,  encrypfing  a  small-ring  polynomial  fhaf  encodes  fhe  plainfexf  slofs  from  fhaf  group. 

One  advanfage  of  fhis  approach  is  fhaf  if  fhe  original  plainfexf  polynomial  a  was  “sparsely  populafed”, 
holding  only  a  few  plainfexf  elemenfs  in  ifs  slofs,  fhen  we  can  reduce  fhe  number  of  small  ring  cipherfexfs 
fhaf  we  generate  fo  fhe  bear  minimum  number  needed  fo  hold  fhese  few  plainfexf  slofs.  A  good  example 
for  Ibis  scenario  is  fhe  compulation  of  fhe  AES  circuil  in  [14]:  Since  fhere  are  only  16  byfes  in  fhe  AES 
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state,  we  only  use  16  slots  in  the  plaintext  polynomial  a.  In  this  ease,  as  long  as  we  have  at  least  16  slots  in 
small-ring  polynomials,  we  ean  eontinue  working  with  a  single  small-ring  eiphertext  (as  opposed  to  the  u 
ciphertexts  that  the  technique  of  the  previous  section  gives  us). 

4.1  Ring- Switching  with  Plaintext  Encoding 

Below  we  describe  our  method  for  converting  the  plaintext  encoding  between  the  different  rings,  for  the 
special  case  where  the  order  of  2  is  the  same  in  (Z/mZ)*  and  (Z/tuZ)*.  As  explained  in  Section  2.2,  each 
plaintext  slot  in  the  big-ring  polynomial  is  associated  with  a  conjugacy  class  of  2  in  (Z/mZ)*  (equivalently, 
an  element  in  the  quotient  group  Qm  =  {J^/rnL)* /  (2)),  and  similar  association  holds  between  plaintext 
slots  in  small-ring  polynomials  and  elements  of  the  quotient  group  Qw  =  {’Ljw'L)*  j  (2).  We  thus  begin  by 
relating  the  structures  and  representations  of  these  two  quotient  groups.  Below  let  T^,  =  {t'l  c 

{TjIwIj)*  be  a  representative  set  for  Qw  i.e.,  a  set  containing  exactly  one  element  from  each  conjugacy 
class  in  (Z/ruZ)*,  ordered  arbitrarily. 

Clearly,  since  w  divides  m  then  (Z/mZ)*  consists  of  (j){m) / (j){w)  copies  of  {'Ljw'L)* .  That  is,  {LjmL)* 
can  be  partitioned  into  (j){m)  j (j){w)  disjoint  sets,  each  of  size  (j){w),  and  each  of  them  congruent  modulo  w 
to  {LjwL)* .  Moreover,  it  is  easy  to  see  that  when  the  order  of  2  is  the  same  in  {LjmL)*  and  {LjwL)* 
then  this  partitioning  can  be  made  to  respect  the  conjugacy  classes  of  2.  Namely  for  any  t  G  {LjwL)* ,  we 
put  2t  mod  m  in  the  same  part  as  t.  Such  conjugation-respecting  partition  of  {LjmL)*  can  be  constructed 
greedily,  adding  conjugacy  classes  from  {LjmL)*  to  the  current  part  until  we  have  a  complete  copy  of 
{LjwL)*,  then  proceeding  to  the  next  part.  Let  Si,  S2,  . .  .he  this  partition  of  {LjmL)*,  so  we  have  the 

properties: 

•  n  =  0  for  all  i  /  j,  and  VJiSi  =  {LjmL)*-, 

•  For  all  i  we  have  \Si\  =  4>{w),  and  also  Si  mod  w  =  {(s  mod  w)  :  s  e  5^}  =  (LjwL)*;  and 

•  For  all  i  we  have  25*  mod  m  =  {(2s  mod  m)  :  s  ^  5*}  =  5*. 

Given  the  partition  of  {L/mL)*  to  5j’s  and  the  ordered  representative  set  for  Q^,  one  way  of  getting  an 
ordered  representative  set  Tm  for  Qm  is  to  set 

Tm  =  {f  G  (Z/mZ)*  :  3  t' G  s.t.  f  =  f'  (modru)}, 

obviously  this  set  Tm  has  exactly  one  element  from  each  conjugacy  class  in  every  part  5*.  We  can  order  it, 
Tm  =  {ti,t2,  ■  ■  ■ ,  te^},  by  taking  all  the  elements  from  one  part  5*  before  taking  any  of  the  elements  from 
the  next  part  5j+i,  and  among  the  elements  from  the  same  part  use  the  ordering  of  T,„. 

Finally,  fixing  a  specific  primitive  m’fh  roof  of  unify  (  G  ¥2d  and  fhe  parficular  primitive  m’fh  roof 
of  unify  r  =  we  lef  fhe  j’th  plainfexf  slof  encoded  in  a  G  Rm,2  be  fhe  evaluation  a(C*^  )  G  F2d,  and 
similarly  fhe  jTh  plainfexf  slof  encoded  in  a*  G  Rw,2  is  the  evaluation  a*{T^^).  The  following  lemma  plays 
an  important  role  in  our  transformation: 

Lemma  6.  Let  m  =  u  ■  w  for  odd  integers  u,  w,  and  denote  by  d  the  order  of  2  in  {LjmL)*.  Let  Q  be  a 
primitive  m’th  root  of  unity  in  ¥211,  and  denote  r  =  so  r  is  a  primitive  w’th  root  of  unity. 

Let  S  C  {LjmL)*  be  a  subset  satisfying  (a)  |5|  =  4’{w)  and  S  mod  w  =  {LjwL)*,  and  (b)  S  is  closed 
under  multiplication  by2,S  =  25  mod  m.  Then  there  exists  a  polynomial  h  G  Rw,2  such  that  for  all  j  G  5, 
it  holds  that  h{T^)  =  (j^. 
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Proof.  Clearly,  since  |5|  =  ^{w)  then  there  exists  a  unique  polynomial  h  over  F2d  of  degree  smaller  than 
(t){w)  such  that  h{T^)  =  all  j  G  S.  It  is  left  to  show  only  that  /i  is  a  polynomial  over  the  base  field, 
i.e.  with  0-1  coefficients.  To  show  this,  note  that  by  definition  of  h  we  have  h{T^)  =  for  all  j  G  S,  and 
moreover  2j  G  S  whenever  j  ^  S  (and  hence  Thus,  we  get  for  all 

=  (C'^)^  =  h{T^)‘^. 

Since  S  mod  w  =  {Xjwh)*  then  the  set  {t^  :  j  G  5}  ranges  over  all  the  primitive  m’th  roots  of  unity  in 
F2<i,  so  we  have  h{6‘^)  =  h{6)‘^  for  every  primitive  w’th  root  of  unity  9.  It  is  a  well-known  fact  that  for  an 
arbitrary  polynomial  h{X)  of  degree  smaller  than  4){w)  over  F2d,  if  h{9‘^)  =  h{6)'^  holds  for  every  primitive 
ru’th  root  of  unity  9  G  F2<i,  then  h  is  in  fact  a  polynomial  over  the  base  field,  i.e.  a  polynomial  wifh  0-1 
coefficienfs.  This  conclude  fhe  proof.  □ 

We  are  now  ready  fo  show  how  fo  converf  a  big-ring  cipherfexf  c,  encrypting  some  polynomial  a  G  Rm,2 
info  a  single  small-ring  ciphertext  fhaf  encrypf  some  ofher  a*  G  Rw,2,  such  fhaf  a*  encodes  all  fhe  plainfexf 
elemenfs  fhaf  were  encoded  in  fhe  plainfexf  slofs  corresponding  fo  one  of  fhe  Sfs  (i.e.,  all  fhe  slofs  H  Si 
for  some  Si). 

We  begin  by  using  fhe  fransformafion  from  fhe  previous  section  fo  consfrucf  from  c  fhe  collecfion  of  u 
small-ring  cipherfexfs  cq,  ci, . . . ,  c^-i  thaf  encrypf  fhe  polynomials  do,  di, . . . ,  d^-i  G  Rw,2,  respectively, 
where  fhe  dfc’s  are  relafed  fo  fhe  original  a  via  fhe  assembly  formula  a(X)  =  mod  (‘l>m,2). 

Considering  all  of  fhese  0-1  polynomials  as  members  of  F2d[X],  and  letting  Q  G  F2d  be  a  primifive  roof  of 
unify  (so  ^  is  a  roof  of  [<hm  mod  2]  over  F2d),  fhe  assembly  formula  implies  in  particular  fhaf 

U—1  U—1 

«(C^)  =  X]  ■  ak{r^)  for  every  j  G  Si 

k=0  k=0 

(where  r  =  C“)-  Observing  fhaf  Si  safisfies  fhe  condifions  of  Lemma  6,  lef  h  G  Rw,2  be  fhe  polynomial 
satisfying  h{T^)  =  for  all  j  G  Si.  Furfher,  lef  us  denofe  =  {h^  mod  (<h^,  2))  G  Rw,2-  Since  for  all 
j  G  Si,  is  a  primifive  ru’fh  roof  of  unify  (and  hence  a  roof  of  mod  2]  over  F2d),  fhen  we  gef 

hkij^)  =  h(T^)^  =  for  every  y  G  . 

We  now  sef  c*  =  X]fc=o  mod  {^w,q),  and  nofe  fhaf  fhis  is  a  linear  combination  of  fhe  valid  cipher¬ 

fexfs  Ck  wifh  low-norm  coefficienfs.  (The  d^’s  have  low  norm  because  fhey  are  0-1  polynomials.)  Using 
fhe  addifive  homomorphism  of  fhe  crypfosysfem  (over  fhe  small  ring  Rw),  this  means  that  c*  is  still  a  valid 
small-ring  ciphertext,  encrypting  the  polynomial  a*  =  X]fc=o  ■  dfc  mod  2)  G  Rw,2-  Moreover,  by 
our  definition  of  the  d^’s  we  have  that  for  all  j  G  Tm  H  Si, 

U—1  U—1 

a*{T^)  =  Y.hk{T^)-ak{T^)  =  j;C'"-dfc(Cn  = 

k=0  k=0 

Using  our  encoding  conventions  from  the  beginning  of  this  section,  this  means  that  the  content  of  the  plain¬ 
text  slots  of  a*  is  exactly  the  content  of  the  plaintext  slots  in  a  corresponding  to  Tm  H  Si. 

Ring-switching  for  “sparsely  populated”  ciphertexts.  We  mentioned  that  when  the  original  big-ring  ci¬ 
phertext  was  sparsely  populated,  we  would  like  to  reduce  it  to  only  a  small  number  of  small-ring  ciphertexts, 
only  as  many  as  needed  to  hold  all  the  plaintext  slots  that  contain  real  data.  If  the  full  slots  are  not  already 
packed  together  in  one  (or  a  few)  of  the  parts  Si,  then  we  can  apply  the  slot  permutation  techniques  of  Gentry 
et  al.  [12]  to  pack  them  as  needed  inside  the  big-ring  ciphertext,  before  breaking  it  into  the  small-ring. 

175 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


4.2  The  General  Case 


The  above  treatment  relies  on  the  order  of  2  in  {'Ljw'L)*  and  (Z/mZ)*  being  the  same  d.  However,  the 
only  part  that  relies  on  this  faet  was  Lemma  6,  where  we  needed  it  in  order  to  prove  that  the  polynomial 
h  is  defined  over  the  base  field.  In  fhe  general  ease  fhis  no  longer  holds,  so  allhough  we  ean  define  Ihe 
polynomials  hk  (and  Iherefore  a*)  jusl  as  above,  all  of  fhese  polynomials  will  now  have  eoeffieienfs  from 
fhe  extension  field  F2d  rafher  fhan  0-1  eoeffieienfs.^  This  is  unavoidable  in  general,  sinee  we  know  fhaf  we 
eannof  always  eneode  F2d  elemenfs  as  polynomials  in  fhe  small  ring  Iiw,2- 

In  prineiple  fhere  is  no  problem  wifh  using  plainfexl  arithmetic  over  ¥2d[X]/^yj  (rather  than  Rw,2  = 
F2[2f]/<l>(ru)).  Fixing  a  representation  F2d  =  ¥2\Y]/G{Y),  we  can  represent  the  plaintext  polynomial 
A{X)  G  F2d[2f]/<h^t,(X)  as  a  bivariate  polynomial  yl(2f,  y)  G  F2[X,  y]/(<h^(X),  G(y)),  writing  each 
coefficient  from  F2d  as  a  degree-(d  —  1)  polynomial  in  Y.  This  means  that  A  can  be  written  as  A{X,  Y)  = 
Yli=o  with  the  Oj’s  0-1  polynomials  in  Rw,2-  An  encryption  of  a  A.  then  consists  of  d  small-ring 

ciphertexts  encrypting  the  a^’s,  and  arithmetic  operations  can  be  implemented  naturally  using  our  basic 
operations  on  encryptions  of  the  Uj’s.  However,  this  is  likely  to  be  quite  inefficient,  probably  even  less 
efficient  than  keeping  everything  in  the  big  ring. 

We  remark  that  in  many  settings,  even  though  our  plaintext  slots  can  hold  elements  in  F2d,  we  really  only 
use  them  to  hold  elements  from  a  much  smaller  sub-field  (e.g.  bifs  or  F28  elemenfs).  One  could  Iherefore 
hope  fhaf  fhe  fechnique  from  above  could  be  generalized  fo  map  fhe  F2d  plainfexl  slols  over  fhe  big  ring  info 
F2d'  slols  over  fhe  small  ring,  such  fhaf  if  fhe  conlenl  of  fhe  slols  happened  lo  already  belong  lo  fhe  subfield 
F2d'  then  it  will  be  copied  intact.  Finding  such  a  generalization  for  every  d'\dis  an  interesting  open  problem. 

For  the  case  where  we  use  the  plaintext  slots  to  hold  just  bits,  it  turns  out  that  we  can  use  a  slight 
adaptation  of  the  procedure  for  d'  =  d.  In  this  case,  the  transformation  from  above  yields  an  encryption  of  a 
polynomial  A{X)  over  F2d,  that  contains  in  its  slots  whatever  we  had  in  the  original  big -ring  polynomial.  In 
particular  it  means  that  A(r^)  G  {0, 1}  for  every  k,  hence  in  this  case  A  must  be  a  0-1  polynomial.  So  after 
we  compute  an  encryption  of  A  (as  a  set  of  d  encryptions  as  above),  we  can  just  discard  all  the  ciphertexts 
except  the  one  corresponding  to  oq. 
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Abstract.  The  security  of  contemporary  homomorphic  encryption  schemes 
over  cyclotomic  number  field  relies  on  fields  of  very  large  dimension.  This 
large  dimension  is  needed  because  of  the  large  modulus-to-noise  ratio 
in  the  key-switching  matrices  that  are  used  for  the  top  few  levels  of  the 
evaluated  circuit.  However,  a  smaller  modulus-to-noise  ratio  is  used  in 
lower  levels  of  the  circuit,  so  from  a  security  standpoint  it  is  permissible 
to  switch  to  lower-dimension  fields,  thus  speeding  up  the  homomorphic 
operations  for  the  lower  levels  of  the  circuit.  However,  implementing 
such  field-switching  is  nontrivial,  since  these  schemes  rely  on  the  field 
algebraic  structure  for  their  homomorphic  properties. 

A  basic  ring-switching  operation  was  used  by  Brakerski,  Gentry  and 
Vaikuntanathan,  over  rings  of  the  form  Z[A]/(A’^  -h  1),  in  the  context 
of  bootstrapping.  In  this  work  we  generalize  and  extend  this  technique 
to  work  over  any  cyclotomic  number  field,  and  show  how  it  can  be  used 
not  only  for  bootstrapping  but  also  during  the  computation  itself  (in 
conjunction  with  the  “packed  ciphertext”  techniques  of  Gentry,  Halevi 
and  Smart). 

Keywords.  Homomorphic  Encryption,  Ring-LWE 


1.  Introduction 

The  last  few  years  have  seen  a  rapid  advance  in  the  state  of  fully  homomorphic 
encryption,  yet  despite  these  advances,  the  existing  schemes  are  still  too  expensive 
for  many  practical  purposes.  In  this  paper  we  make  another  step  forward  in  making 
such  schemes  more  efficient.  In  particular,  we  present  a  technique  for  reducing 
the  dimension  of  the  ciphertexts  involved  in  the  homomorphic  computation  of 
the  lower  levels  of  a  circuit.  Our  techniques  apply  to  homomorphic  encryption 
schemes  over  number  helds,  such  as  the  schemes  of  Brakerski  et  al.  [4,5,3],  as  well 
as  the  variants  due  to  Lopez-Alt  et  al.  [14]  and  Brakerski  [2]. 

The  most  efficient  variants  of  these  schemes  work  over  number  helds  of  the  form 
Q(C)  =  Q[A]/F(A),  and  in  all  of  them  the  held  dimension  n,  which  is  the  degree 
of  F{X),  must  be  set  large  enough  to  ensure  security:  to  support  homomorphic 
evaluation  of  depth-L  circuits  with  security  parameter  A,  the  schemes  require 
n  =  Q{L  ■  polylog(A)),  even  under  the  strongest  plausible  hardness  assumptions 
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for  their  underlying  computational  problems  (e.g.,  ring-LWE  [15])d  In  practice, 
the  field  dimension  for  moderately  deep  circuits  can  easily  be  many  thousands. 
For  example,  to  be  able  to  evaluate  AES  homomorphically.  Gentry  et  al.  [13]  used 
circuits  of  depth  L  >  50,  with  a  corresponding  field  dimension  of  over  50,000. 

As  homomorphic  operations  are  performed,  the  ratio  of  noise  to  modulus 
in  the  ciphertexts  grows.  Consequently,  it  becomes  permissible  to  use  lower- 
dimension  fields,  which  can  speed  up  further  homomorphic  computations.  However, 
since  we  must  start  with  ciphertexts  from  a  high-dimensional  field,  we  need 
a  method  for  transforming  them  into  small-field  ciphertexts  that  encrypt  the 
same  (or  related)  messages.  Such  a  “field  switching”  procedure  was  described 
by  Brakerski  et  al.  [3],  in  the  context  of  reducing  the  ciphertext  size  prior  to 
bootstrapping.  The  procedure  in  [3],  however,  is  specific  to  number  fields  of  the 
form  K2k  =  Q[Ar]/(X^  -I-  1),  i.e.,  cyclotomic  number  fields  with  power-of-2 

index.  Moreover,  by  itself  it  cannot  be  combined  with  the  “packed  evaluation” 
techniques  from  [18,11].  (These  techniques  use  Chinese-remainder  encoding  to 
“pack”  many  plaintext  values  into  each  ciphertext,  and  then  each  homomorphic 
operation  is  applied  to  all  these  values  at  once.  For  our  purposes,  we  must  consider 
the  effect  of  the  field-switching  operation  on  all  these  plaintext  values.)  Extending 
and  improving  the  field  switching  procedure  is  the  goal  of  our  work. 

1.1.  Our  Contribution 

We  present  a  general  field-switching  transformation  that  can  be  applied  to  any 
cyclotomic  number  field  K  =  Q(('m)  —  Q[Al]/<l>m(A')  for  arbitrary  m  (where 
$m(Ai)  G  Z[Ai]  is  the  mth  cyclotomic  polynomial),  and  works  well  in  conjunction 
with  packed  ciphertexts.  For  any  divisor  m'  of  m,  our  procedure  takes  as  input  a 
“big-field  ciphertext”  c  over  K  that  encrypts  many  plaintext  values,  and  outputs  a 
“small-field  ciphertext”  c'  over  K'  =  Q(Cm')  —  Q[-A]/<i>m/(X)  C  K  that  encrypts  a 
certain  subset  of  the  input  plaintext  values.^ 

Our  transformation  relies  heavily  on  the  algebraic  properties  of  the  cyclotomic 
number  fields  AT,  K'  and  their  respective  rings  of  algebraic  integers  R,  R' .  In  par¬ 
ticular,  we  use  the  interpretation  of  K  as  an  extension  field  of  K' ,  and  relationships 
between  their  various  embeddings  into  the  complex  numbers  C;  the  factorization 
of  integer  primes  in  R  and  i?';  and  the  trace  function  that  maps  elements 

in  K  to  the  subfield  K'.  With  these  tools  in  hand,  the  transformation  itself  is 
quite  simple,  and  consists  of  the  following  three  steps: 

1.  We  first  apply  a  key-switching  operation  to  obtain  a  big-field  ciphertext 
over  K  with  respect  to  a  small-field  secret  key  s'  G  K'  C  K.  Proving  the 
security  of  this  operation  relies  on  a  novel  way  of  embedding  the  ring-LWE 
problem  over  K'  into  K,  which  may  be  of  independent  interest. 

^The  schemes  from  [3,2]  can  also  obtain  security  by  using  high-dimensional  vectors  over 
low-dimensional  number  fields.  But  their  most  efficient  variants  use  low-dimensional  vectors  over 
high-dimensional  fields,  since  the  runtime  of  certain  operations  is  cubic  in  the  dimension  of  the 
vectors. 

^More  generally,  the  output  ciphertext  can  even  encrypt  certain  linear  functions  of  the  input 
plaintext  values. 
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2.  Next,  we  multiply  the  resulting  ciphertext  by  a  certain  element  of  the 
ring  R  C  K,  which  depends  only  on  the  subset  (or  other  function)  of  the 
plaintext  values  that  we  want  to  include  in  the  output  ciphertext. 

3.  Finally,  we  take  the  trace  of  the  iF-elements  in  the  ciphertext,  thus  obtaining 
an  output  ciphertext  over  the  subheld  K' ,  which  decrypts  under  the  secret 
key  s'  e  K'  to  the  desired  plaintext  values. 

We  note  that  in  addition  to  being  simpler  and  more  general  than  the  transformation 
from  [3],  our  transformation  is  also  more  efficient  even  when  applied  in  the  special 
case  of  K2k :  when  switching  from  K2k  to  K2k' ,  the  transformation  from  [3]  includes 
a  step  where  the  size  of  the  ciphertext  (and  hence  the  time  that  it  takes  to  perform 
operations)  is  expanded  by  a  factor  of  2^“^  .  Our  transformation  does  not  need 
that  extra  step,  hence  saving  this  extra  factor  in  performance. 

In  Section  2  below  we  recall  the  algebraic  concepts  needed  for  our  transforma¬ 
tion,  and  then  the  transformation  itself  it  described  in  Section  3. 


2.  Preliminaries 

This  work  uses  a  number  of  algebraic  concepts  and  notations;  to  assist  the  reader 
we  summarize  the  most  important  ones  in  Table  1.  For  any  positive  integer  u  we 
let  [u]  =  {0, . . . ,  u  —  1}.  Throughout  this  work,  for  a  coset  z  £  Zg  =  ZjqZ  we  let 
[z\q  £  Z  denote  its  canonical  representative  in  Z  n  [—<7/2,  (7/2).  One  can  also  view 
[•]q  as  the  operation  that  takes  an  arbitrary  integer  z  and  reduces  it  modulo  <7  into 
the  interval  [—9/2, 9/2). 

2.1.  Algebraic  Background 

Recall  that  an  ideal  /  in  a  commutative  ring  i?  is  a  nontrivial  (i.e.,  /  7^  0  and 
I  7^  {0})  additive  subgroup  which  is  closed  under  multiplication  by  R.  For  ideals 
I,  J,  their  sum  is  the  ideal  I+J  =  {a +  b:a£l,b£  J},  and  their  product  IJ  is 
the  ideal  consisting  of  all  .sums  of  terms  ab  for  a  £  I,b  £  J.  An  i?-ideal  p  is  prime 
a  ab  £  p  (for  some  a,b  £  R)  implies  a  S  p  or  &  G  p  (or  both).  All  the  rings  we 
work  with  have  unique  factorization  of  ideals  into  powers  of  prime  ideals,  and  a 
Chinese  Remainder  Theorem. 

A  fractional  ideal  is,  informally,  an  ideal  with  a  denominator.  Formally,  letting 
K  be  the  field  of  fractions  of  R,  a  fractional  ideal  of  i?  is  a  subset  I  C  K  for 
which  there  exists  a  denominator  d  £  R  such  that  dl  C  R  is  an  ideal  in  R.  For 
an  i?-ideal  I,  the  quotient  ring  Rj  =  R/ 1  consists  of  the  residue  classes  a  +  I  for 
all  a  £  R,  with  the  ring  operations  induced  by  R.  More  generally,  for  a  (possibly 
fractional)  ideal  I  and  an  ideal  J  C  R,  the  quotient  Ij  =  I/IJ  is  an  additive 
group,  and  an  i?-module,  with  addition  and  multiplication  operations  induced  by 
R.  We  often  write  a  mod  I  instead  of  a  +  I  to  denote  the  residue  classes  a  +  I, 
and  we  write  a  =  b  (mod  *)/  to  denote  that  a,  b  belong  to  the  same  residue  class, 
i.e.,  a  1  =  b  1 . 

For  computational  purposes,  all  of  the  rings  and  fields  we  work  with  have 
efficient  representations  of  their  elements,  and  efficient  (i.e.,  polynomial  time  in  the 
bit  length  of  the  arguments)  algorithms  for  all  the  operations  we  use.  For  quotients 
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Notations 

Description 

P’ 

The  (prime)  modulus  of  the  cryptosystem’s  native  plaintext 
space,  and  the  finite  field  of  order  p^. 

m,  m', 

n  =  n'  =  (/?(m') 

The  indices  of  the  cyclotomic  fields,  where  m'\m.  We  switch 
from  the  mth  to  the  m'th  cyclotomic  number  field,  which  are 
of  degree  n,  n'  (respectively)  over  the  rationals. 

?fi,  d,  e,  /, 
fh\d',e\f' 

m  is  the  largest  divisor  of  m  that  is  coprime  with  p;  d  is  the 
order  of  p  in  e  =  (p(m)/(p(m);  and  /  =  (/?(m)/d.  Similarly 

for  rh' ,  d\  ,  f' . 

Cm-,  Cm' 

Abstract  elements  of  order  m,  m'  (respectively)  over  the  ratio¬ 
nals. 

tC  =  Q(U),  K'=Q(U,), 

R  =  Z[U],  it'  =  Z[Cmd 

The  cyclotomic  number  fields  and  their  rings  of  integers. 

a:  K 

a':  K'  ^  C”' 

The  canonical  embeddings  of  K,  K' ,  which  endow  the  number 
fields  with  a  geometry. 

The  trace  function,  which  is  the  sum  of  the  automorphisms 
of  K  that  fix  K'  pointwise. 

< 

< 

The  codifferent  (or  dual)  fractional  ideals  of  R  and  R'  (respec¬ 
tively),  defined  as  Ry  =  {a  :  Trj^/Q(ai?)  C  Z}  and  similarly 
for  {R'y. 

G  =  Z*^lip), 

The  multiplicative  quotient  groups  that  characterize  the  prime- 
ideal  factorizations  of  pR,pR\  respectively. 

g.G^G’ 

The  (///')-to-l  homomorphism  defined  via  f  i— >  f  mod  in' . 

Table  1. 

Summary  of  the  main  algebraic  notations. 

AjB^  cosets  are  represented  using  a  fixed  set  of  distinguished  representatives.  In 
this  work  we  largely  ignore  the  details  of  concrete  representations  and  algorithms, 
and  refer  to  [16]  for  fast,  specialized  algorithms  for  working  with  the  cyclotomic 
fields  and  rings  that  we  use  in  this  work. 

2.1.1.  Cyclotomic  Fields  and  Rings 

For  a  positive  integer  m,  let  K  =  Q(Cm)  be  the  mth  cyclotomic  number  field,  where 
Cm  is  an  abstract  element  of  order  m.  (In  particular,  we  do  not  view  Cm  as  any 
particular  root  of  unity  in  C.)  The  minimal  polynomial  of  Cm  is  the  mth  cyclotomic 
polynomial  <I>m(-^)  =  Iliez*  ~  ’Im)  ^  ^[^])  where  rjm  =  exp(27r-y/— 1/m)  G  C 
is  the  principal  mth  complex  root  of  unity,  and  the  roots  G  C  range  over  all 
the  primitive  complex  mth  roots  of  unity.  Therefore,  iG  is  a  field  extension  of 
degree  n  =  (p{m)  over  Q,  and  is  isomorphic  to  the  polynomial  ring  Q[X]/<i)m(-^) 
by  identifying  Cm  with  X.  (There  are  other  representations  of  K  as  well,  and 
nothing  in  this  work  depends  on  a  particular  choice  of  representation.)  The  ring 
of  (algebraic)  integers  in  K,  called  the  mth  cyclotomic  ring,  is  i?  =  Z[Cm],  which 
is  isomorphic  to  'L[X\/^jn{X). 

The  field  extension  iG/Q  has  n  automorphisms  Ti'.  K  ^  K  that  fix  Q  pointwise, 
which  are  characterized  by  Ti((m)  =  Cm  for  *  G  ^m-  (Equivalently,  Ti(a(X))  = 
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a(X*)  mod  when  viewing  K  as  Q[X]/$m(-^)-)  Because  K/Q  is  Galois  (i.e., 

the  number  of  automorphisms  equals  the  dimension  of  the  extension),  the  Q- linear^ 
(field)  trace  Tr^/Q:  K  ^  Q  can  be  defined  as  the  sum  of  the  automorphisms: 
Tr/4:/Q(a)  =  ^  Q-  (^6®  below  for  another  formulation.) 

Similarly  to  the  automorphisms  Ti  (which  map  K  to  itself),  there  are  n 
concrete  ways  of  viewing  K  as  a  subfield  of  the  complex  numbers  C.  Namely, 
there  are  n  injective  ring  homomorphisms  from  K  to  C  that  fix  Q  pointwise, 
called  embeddings,  which  are  denoted  at:  K  ^  C  tor  i  €  and  characterized  by 
o’i(Cm)  =  ??m-  The  embeddings  may  be  seen  as  the  compositions  of  the  abstract 
automorphisms  Ti  with  the  complex  embedding  that  identifies  (m  &  K  with 
rjm  G  C.  Therefore,  the  field  trace  can  also  be  written  as  the  sum  of  the  embeddings, 
as  Tr;f/Q(o)  =  ^  Q-  The  canonical  embedding  a:  K  ^  C”  is  the 

concatenation  of  all  the  complex  embeddings,  i.e.,  a{a)  =  (o'i(a))i^z^!  and  it 
endows  K  with  a  canonical  geometry.  In  particular,  define  the  Euclidean  (£2)  and 
£ao  norms  on  K  as 


h\\  ■■=  \\cr{a)\\  =  and  ||a||oo  :=  |k(a)||oo  =  max|crj(a)|, 

respectively.  Note  that  ||a  •  5||  <  ||a||oo  •  ||&||  and  ||a  •  6||oo  <  l|a||oo  •  Halloo  for  any 
a,b  G  K,  because  the  ai  are  ring  homomorphisms. 

2.1.2.  Towers  of  Cyclotomics 

For  any  positive  integer  m'  dividing  m,  let  K'  =  Q(Cm')  and  R'  =  'Z[(jn’]  be 
the  m'ttr  cyclotomic  field  and  ring  (of  dimension  n'  =  ip{m')  over  Q  and  Z), 
respectively.  As  above,  the  field  extension  K' /Q  has  n'  =  (p{m')  automorphisms 
r(, :  K'  ^  K'  and  n'  complex  embeddings  a[, :  A"'  — >  C  (for  i'  S  Z^,),  the  latter 
of  which  define  the  canonical  embedding  a'  ■.  K'  ^  C"  . 

We  will  use  extensively  the  fact  that  AT  is  a  field  extension  of  AT',  and  R  is 
a  ring  extension  of  R' ,  both  of  dimension  njn'  (because  AT/Q  and  K' have 
dimensions  n  and  rd ,  respectively).  That  is,  K'  and  R!  may  respectively  be  seen  as 
a  subfield  of  AT  =  and  a  subring  of  A  =  A' [Cm],  under  the  ring  embedding 

that  identifies  Cm'  with  Cm^"^  •  Moreover,  the  field  extension  KfK'  is  Galois,  i.e., 
it  has  n/n'  automorphisms  that  fix  K'  pointwise,  which  are  precisely  those  Ti  for 
which  i  =  1  (mod  *)to'.  This  follows  from  the  fact  that 

—  ( ^  \  _  _  ( 2-vnlm!  \  _  ^{mjm')i  mod  m  _  mod  m'  /i  \ 

and  that  reducing  modulo  m'  induces  an  (n/n')-to-l  mapping  from  Z))j  to  Z))^, . 
The  A'-linear  (intermediate)  trace  function  Tt^/k'  :  K  ^  K'  may  be  defined  as 
the  sum  of  these  automorphisms: 


TirK/K'{a)  =  X! 

(mod  *)m' 


function  /  is  S'-linear  if  f{a  b)  =  f{a)  +  f{b)  and  f(s  -  a)  =  s  -  f{a)  for  all  s  G  5"  and  all 

a,  b. 
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A  standard  fact  from  field  theory  is  that  the  intermediate  trace  satisfies  Tr^/Q  = 
Tr/f'/Q  o  Another  standard  fact  is  that  Tr^/j^/  is  a  “universal”  AT'-linear 

function,  in  that  any  such  function  L:  K  K'  can  be  expressed  as  L{a)  = 
TrK/K'iT  •  a)  for  some  fixed  r  €  K. 

Similarly  to  Equation  (1),  for  any  i  G  the  embedding  at  coincides  with 
(j-  on  the  sub  field  K' .  Using  this  fact  we  get  the  following  relation  between 

the  intermediate  trace  and  the  complex  embeddings  of  K  and  K' . 

Lemma  2.1.  For  any  a  G  K  and  i!  G 

o-r(TrK/K'(a))  =  cri(a). 

i—i'  (mod  *)m' 


In  matrix  form,  =  P  ■  cr(a),  where  P  is  the  (p{m')-by-ip{m)  matrix 

(with  rows  indexed  by  i'  G  and  columns  by  i  G  whose  {i' ,i)th  entry  is  1 
if  i  =  i'  (mod  *)m' ,  and  is  0  otherwise. 

Proof.  Fix  an  arbitrary  k  G  such  that  k  =  i'  (mod  *)m' .  Then  because  ct', 
coincides  with  ak  on  K' ,  and  by  definition  of  PxxjK'  and  linearity  of  (Jfc,  we  have 


(mod  *)m' 


^k{Tj{a)) 

j—1  (mod  m') 


i—i' 


(mod  m') 


where  for  the  last  equality  we  have  used  akOTj  =  ak-j  and  k  G  ZIf,,  so  i  =  k-j  G  Z(^ 
runs  over  all  indices  congruent  to  i'  modulo  m'  when  j  G  Z(^  runs  over  all  indices 
congruent  to  1  modulo  m' .  □ 


An  immediate  corollary  is  that  the  intermediate  trace  maps  short  elements  of  K 
to  short  elements  of  K' . 

Corollary  2.2.  For  any  a  G  K,  we  have  ||Tr;f/;4;/(a)||  <  ||a||  •  ^njn' . 

Proof.  By  Lemma  2.1,  we  have  cr'(Tr/f//f/(a))  =  P  ■  a{a).  The  rows  of  P  are 
orthogonal  (since  each  column  of  P  has  exactly  one  nonzero  entry),  and  each  has 
Euclidean  norm  exactly  y^nfn'.  □ 


2.1.3.  Prime  Splitting  and  Plaintext  Arithmetic 

We  now  describe  the  factorization  (“splitting”)  of  prime  integers  in  cyclotomic 
rings,  how  it  allows  for  encoding  and  operating  on  several  finite-field  elements, 
and  the  particular  functions  induced  by  the  (intermediate)  trace  function  Tt^/k'- 
Further  details  and  proofs  can  be  found  in  many  texts  on  algebraic  number  theory, 
e.g.,  [19], 
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pi 


Pl5  P22 


P3  Pl7  P31 


Figure  1.  Factorization  of  2  £  Z  into  distinct  prime  ideals  p',  in  R'  =  ^[^7],  and  pi  in  R  =  Z[(^9i]. 
The  displayed  subscripts  indicate  a  choice  of  representatives  from  the  cosets  of  the  multiplicative 
subgroups  (2)  C  Zy  and  (2)  C  which  have  orders  d'  =  3  and  d  =  12,  respectively. 

Prime  splitting.  Let  p  €  Z  be  a  prime  integer.  In  the  mth  cyclotomic  ring 
R  =  '^[Cm]  (which  has  degree  n  =  (p{m)  over  Z),  pR  is  often  not  a  prime  ideal,  but 
instead  factors  into  prime  ideals.  To  describe  how,  we  first  need  to  introduce  some 
notation.  Divide  out  all  the  factors  of  p  from  m,  writing  m  =  fh-p^  where  p\fh.  Let 
e  =  p{p^),  and  let  d  be  the  multiplicative  order  of  p  modulo  to  (i.e.,  in  ZJjj);  note 
that  d  divides  (p{fh)  =nle.  (The  values  d,  e  are  respectively  called  the  inertial 
degree  and  ramification  index  of  p  in  R.)  Let  G  =  ZJ^/(p),  the  multiplicative 
quotient  group  ZJ^  modulo  the  order-d  subgroup  generated  by  p,  so  G  has  order 
/  =  ip{fh)/d  =  n/{de).  For  an  element  i  G  G  of  this  group,  we  sometimes  write 
i{p)  to  emphasize  that  it  is  a  coset,  and  (slightly  abusing  notation)  also  let  i  G  Z^ 
denote  some  element  of  the  coset.  The  ideal  pR  factors  as 

PR=I[  P^  (2) 

ieG 

where  the  pi  are  distinct  prime  ideals  in  R,  all  having  norm  |i?/pi|  =  p‘^.  These 
are  called  the  prime  ideals  lying  over  p  in  R.  Each  quotient  ring  i?/pi  is  therefore 
isomorphic  to  the  finite  field  F^d .  (In  fact  there  are  exactly  d  isomorphisms  between 
them,  because  F^d  has  d  automorphisms.) 

Concretely,  the  prime  ideals  pi,  and  the  isomorphisms  between  R/pi  and  (some 
canonical  representation  of)  Fpd,  are  as  follows.  Let  denote  some  arbitrary 
element  of  order  to  in  F^d ;  such  an  element  exists  because  the  multiplicative  group 
F*d  is  cyclic  and  has  order  —  1  =  0  (mod  *)to.  For  any  i{p)  G  G,  the  prime  ideal 
pi  is  the  kernel  of  the  ring  homomorphism  hi'.  R  ^  F^d  defined  by  hi{(^ra)  =  W/n- 
It  is  immediate  that  this  kernel  is  an  ideal;  furthermore,  it  is  invariant  under  the 
choice  of  representative  i  from  the  coset  i(p),  because  hip{r)  =  hi{rY  for  any 
r  G  R  (since  (a  +  bY  =  a^  +  If  for  any  a, 6  G  Fpd).  Because  pi  is  the  kernel  of  hi, 
we  have  the  induced  isomorphism  hi'.  R/pi  Fpd;  indeed,  we  have  d  distinct  such 
isomorphisms,  one  for  each  element  of  the  coset  i{p). 

Looking  ahead,  the  isomorphisms  hi  (for  appropriate  choices  of  representa¬ 
tives  i)  will  be  used  to  define  several  “plaintext  slots”  in  a  homomorphic  cryp¬ 
tosystem,  i.e.,  an  encoding  of  /  plaintext  elements  of  Fpd  as  a  single  element  of 
the  cryptosystem’s  plaintext  ring  R/2R. 

Splitting  in  cyclotomic  towers.  Of  course,  the  above  derivation  also  applies  to  the 
ideals  that  lie  over  p  in  R'  =  Z[Cm']  C  R.  For  each  such  ideal  p',  we  next  describe 
the  factorization  of  p'i?  into  prime  ideals  in  R.  These  are  the  prime  ideals  that  lie 

184 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


over  p'  in  R,  and  since  “lying  over”  is  an  associative  property,  they  also  lie  over  p 
(as  illustrated  in  Figure  1). 

Let  TO,  d,  e,  /,  G  and  the  prime  ideals  pi  for  z  G  G  be  as  above  for  i?,  and 
define  fh' ,  d' ,  e' ,  f ,  G'  =  Z^,/(p)  and  prime  ideals  p',  for  i'  G  G'  similarly  for  R' . 
Note  that  d'\d,  e'\e,  and  f'\f,  and  that  the  natural  homomorphism  g:  G  ^  G' 
defined  via  i  ^  i  mod  fh!  is  surjective  and  {f / Then  for  every  z'  G  G',  the 
factorization  of  p'/  R  is 

GK=  n  pF'=  n 

i—i'  mod  m 

Therefore,  there  are  f  /  f'  prime  ideals  of  R  lying  over  each  p',,  and  taken  over  all 
z'  G  G'  they  partition  the  prime  ideals  of  R  lying  over  p. 

Plaintext  encoding.  Let  F  =  and  F'  =  F^^/  C  F.  By  the  above  and  the 
Chinese  Remainder  Theorem,  the  natural  ring  homomorphisms  yield  the  following 
(where  =  denotes  a  ring  isomorphism): 

R'/pR'  ^  R'/{[)]  []  p',  -  0  R'/pl,  -  rf 

i'&G'  i'eG' 

R/pR^R/{[)]Y[p.  =  R/{[)]Y[  J]  p*=0  0  i?/p.  =  (F//^y'. 

ieG  i'eG' ieg-H*')  i'&G'  i£g-^(i') 

(Note  that  the  first  homomorphism  in  each  line  is  surjective,  but  not  necessarily 
an  isomorphism,  due  to  possible  ramification.)  Following  [18,3,11,12,13],  in  the 
context  of  homomorphic  encryption  the  above  morphisms  allow  for  encoding  a 
vector  of  /'  individual  elements  of  F'  (respectively,  /  elements  of  F)  into  the 
plaintext  ring  i?),  =  R/pR'  (resp.,  Rp  =  R/pR),  so  that  a  single  homomorphic 
addition  and  multiplication  acts  component-wise  on  the  underlying  vectors  of  field 
elements. 

Trace  operations.  As  mentioned  in  the  introduction,  our  field-switching  technique 
is  built  around  applying  the  trace  function  Ttk/k'  to  the  elements  of  a  big- field 
ciphertext,  thus  obtaining  a  related  small-field  ciphertext.  Since  we  use  “packed” 
ciphertexts  that  encrypt  arrays  of  elements  in  F  via  the  above  isomorphisms,  we 
need  to  understand  the  effect  of  the  trace  function  on  those  F-elements. 

The  remainder  of  this  subsection  is  therefore  devoted  to  characterizing  the 
functions  (FH!  )!  F'^  that  can  be  induced  by  Tik/k'-  More  specifically,  we 

determine  exactly  which  functions 

L:R/{YIp.)^R'/{Y[  p',) 

ieG  i'eG' 

can  be  expressed  as  L{a)  =  '  a)  for  some  fixed  r  €  K.  It  turns  out  that 

by  fixing  an  appropriate  choice  of  isomorphisms  between  the  quotient  rings  and 
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finite  fields  above,  we  can  obtain  the  concatenation  of  any  /'  individual  F'-linear 
functions  ^  F'  (see  Corollary  2.5  for  a  precise  statement).^ 

As  already  noted,  the  isomorphisms  between  the  quotient  rings  and  finite  fields 
are  not  necessarily  unique;  they  are  determined  by  the  choice  of  representatives 
i',i  of  the  cosets  i'{p)  C  Zb,  and  i{p)  C  Zb  (respectively),  and  roots  of  unity 
ojfh'  G  and  utfh  G  F.  For  our  purposes,  it  is  important  to  choose  these  in  a 
“consistent”  fashion,  as  follows.  First,  given  LOm,  let  Um'  =  G  F'.  (Note  that 

all  (p{fh')  elements  of  order  m'  in  F  are  indeed  in  the  subfield  F'.)  Next,  let  £  >  0 
be  the  integer  exponent  such  that  m/m'  =  (m/m')  -p^.  Then  given  representative 
i'  of  i'{p)  e  G',  choose  representative  i  for  each  i{p)  G  so  that  p^  ■  i  =  i' 

(mod  *)m'.  Note  that  such  i  always  exists,  by  definition  of  the  quotient  group  G 
and  the  mapping  g.  As  explained  above,  these  choices  fix  particular  isomorphisms 

:  R/p^  F  (for  i{p)  G  G)  and  /i', :  R' /p[,  F'  (for  i' {p)  G  G'), 

which  are  characterized  by  hi{C,m)  =  and  ^/(Cm')  =  ^)n'- 

Next,  for  each  i'  G  G'  denote  the  product  of  prime  ideals  lying  over  p',  in  R 
(called  the  radical  of  p/i?)  by  pr  =  Ilieg-qi')  Pg  define  the  ring  isomorphism 

h^>:  R/pr  ^  F^/-^  ,  hi<{a)  =  ([)] /ii(a  mod  pi) 

where  denotes  the  product  ring  with  coordinate-wise  operations. 

In  Lemma  2.4  below,  we  show  that  under  the  above  isomorphisms,  the  F'- 
linear  functions  L:  F^/b  ^  F'  correspond  bijectively  with  the  A'-ii^ear  functions 
L:  R/pr  — >  R' /p'ii,  for  all  i'  G  G".  Recall  that  any  function  of  the  latter  type 
can  be  expressed  as  L{a)  =  TrK/K'{r  •  a)  for  some  fixed  r  G  K.  Conversely, 
every  function  L  (with  domain  and  range  as  above)  that  can  be  expressed  as 
L{a)  =  Trx/K'if  '  is  clearly  R'-ii^ear,  so  it  always  induces  an  F'-linear  function. 
The  heart  of  Lemma  2.4  is  the  following  fact. 

Lemma  2.3.  Let  p[,  for  some  i'  G  G'  be  a  prime  ideal  lying  over  p  in  R' ,  and  let  pr 
be  the  radical  ofpi>R.  Let  r'  G  R'  G  R  be  arbitrary,  and  let  s  =  h[,{r'  mod  p/)  G 
F'  C  F.  Then 


hi>{r'  mod  p*/)  =  (s,  s, . . . ,  s)  G  F'^/f  , 

i.e.,  every  entry  of  hi'{r'  mod  pi/)  is  equal  to  h[,{r'  mod  p/). 

Proof.  Recall  that  under  our  choice  of  isomorphisms,  ojm'  =  ^m/m  g  jp,  .g  order 
m',  andp^T  =  i'  mod  fh' ,  where  £  >  0  is  the  integer  satisfying  m/m'  =  {m/m')-p^. 
Also  recall  that 


hr  {r'  mod  p,/)  =  (hi{r'  mod  pi)) 

"^Note  that  any  F'-linear  function  L:  — >  F'  can  always  be  expressed  as  L{a)  = 

Trjr/jp/ ({d,  a})  for  some  fixed  d  G  ,  where  (■,  •)  is  the  usual  inner  product  and  Trjryjp/  denotes 

the  (F'-linear)  trace  of  the  field  extension  F/F'. 
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For  the  representative  i  of  each  coset  i{p)  G  g  the  entry  hi{r'  mod  pi)  is 

obtained  by  mapping  Cm  to  u)'^,  and  hence  also  mapping  Cm'  =  Cm^™  =  ^  ^ 

to 


^rh 


=  =  ui.  G  r, 


which  is  exactly  the  mapping  done  by  h',.  Since  r'  G  R'  =  Z[Cm'])  this  proves  the 
claim.  □ 


Lemma  2.4.  Let  i'  G  G'  be  arbitrary,  and  let  p'  =  p',  and  p  =  p^/ .  Then  un¬ 
der  the  isomorphisms  h'  =  h',  and  h  =  hp  defined  above,  the  ¥' -linear  func¬ 
tions  L :  F-t/f  ^  F'  are  in  bijective  correspondence  with  the  R' -linear  functions 
L:  R/p^  R'/p'. 

Proof.  For  any  F'-linear  function  L,  we  claim  that  L  =  h'  ^  o  L  oh  \s  the  corre¬ 
sponding  i?'-linear  function.  To  see  this,  note  that  by  Lemma  2.3  and  the  fact 
that  h  is  a  ring  homomorphism,  for  any  r'  G  Lfi  and  a  G  i?/p  we  have 

h{r'  ■  a)  =  h{r'  mod  p)  ©  h{a)  =  h' {r'  mod  p')  •  h{a)  G  F-t/^  , 

where  multiplication  ©  in  F'-f  and  F-t  is  coordinate-wise.  By  F'-linearity  of  L  and 
the  fact  that  h'  is  a  ring  homomorphism,  we  have 

L{r'  ■  a)  =  h'  ^  {L{h{r'  ■  a)))  =  h'  ^  (  h'{r'  mod  p')  ■  L{h{a))  )  =  r'  ■  L{a)  G  R! /p' , 
as  desired.  The  other  direction  proceeds  essentially  identically,  with  L  =  h'  o  Lo 

h-i.  □ 


An  application  of  the  Chinese  Remainder  Theorem  with  the  prime  ideals  pi' 
in  R,  combined  with  Lemma  2.4,  immediately  yields  the  following  corollary. 

Corollary  2.5.  Let  p'  =  Jli'eG'  Pi'  P  =  Ili'eG'  P*  radicals  of  pR'  andpR, 

respectively.  Then  under  the  isomorphisms  {hi'ji'gG'  cmd  {hi'ji'gG'  defined  above, 
the  R! -linear  functions  L:  R/p  — s-  R'/p'  are  in  bijective  correspondence  with  the 
functions  L:  (FGf'y'  — >  F'-^'  of  the  form 

L(*)(ai')  =  (Li'(ai'))^,gg,, 

where  every  Lp  :  F'  is  V -linear. 

We  note  that  given  a  function  Z:  (F-lZ-f  Y  F'-l"  as  in  Corollary  2.5,  we  can 
efficiently  hnd  an  i?'-linear  function  L\  R  ^  R'  that  induces  the  corresponding  L\ 
first,  fix  an  arbitrary  i?'-basis  B  =  {bj}  of  R.  Then,  using  the  isomorphisms  h', 
and  hi',  the  values  of  L{bj  mod  p)  G  R'/p'  are  determined  by  Z,  and  uniquely 
define  L  by  i?'-linearity.  We  can  then  define  each  L(bj)  G  R'  to  be  an  arbitrary 
representative  of  Libj  mod  p);  these  choices  uniquely  determine  L,  by  i?'-linearity. 
Finally,  we  can  represent  L  explicitly  in  trace  form  as  L(a)  =  Tvx/k'Y  '  Y  for 
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some  r  G  K:  recalling  that  K  is  a.  vector  space  over  K'  with  if'-basis  B,  we  have 
a  full-rank  system  of  linear  equations  L(bj)  =  ■  bj)  G  K' ,  which  we  can 

solve  to  obtain  r  G  K. 

Looking  ahead,  in  our  application  to  homomorphic  computation  we  will  have 
certain  linear  functions  that  we  want  to  evaluate  (e.g.,  projection  functions),  and 
we  will  do  so  by  finding  the  corresponding  constant  r,  then  multiplying  by  r  and 
taking  the  trace  (see  Section  3.3  for  further  details).  To  apply  these  steps  in  the 
context  of  a  homomorphic  encryption  scheme,  we  need  the  notion  of  the  dual  of 
the  ring  of  integers,  described  next. 

2.1. 4-.  Duality 

An  important  and  useful  object  in  K  is  the  dual  of  R  (also  known  as  the  codifferent 
of  AT),  defined  as 


=  {a  G  K  :  Trj^/Q^aR)  C  Z}  D  i?. 

Because  Tr^^/Q  =  Tr^//QoTr;f/^/,  it  is  easy  to  verify  that  also  R'^  = 
{a  G  K  :  Trx/K'{<iR)  ^  R''^}.  Therefore,  we  have  the  convenient  equation 


TrK/K'{R'^)  =  R''^-  (3) 

Note  that  by  contrast,  frequently  (i?)  does  not  equal  R',  but  is  instead  some 

proper  ideal  of  it.^  Many  other  algebraic  and  geometric  advantages  of  working 
with  R^  instead  of  R  are  discussed  in  [15,16]. 

The  codifferent  is  a  principal  fractional  ideal,  i.e.,  R^  =  t~^R  for  some  t  G  R 
(which  is  not  unique).  Therefore,  division  by  t  induces  a  bijection  from  R  to  Ry , 
and  from  any  quotient  ring  Rp  =  R/p  to  R^  =  R^  /pR^ .  Although  the  target 
objects  are  not  rings  (because  R^  ■  Rl'  ^  R^),  they  are  i?-modules,  and  the 
bijections  are  i?-module  isomorphisms. 

Of  course,  we  also  have  R'^  =  t'~^R  for  some  t'  G  R .  By  Equation  (3)  and 
i^'-linearity  of  the  trace,  for  any  ideal  p  in  i?',  we  have 

Tr;,/^,(i?^)  =  TYK/K'iR'^/pR'^)  =  R'^/pR'^  = 

In  the  previous  subsection  we  considered  i?'- linear  functions  L:  R  ^  R 
(or  their  induced  functions  Rp  R),  which  can  always  be  expressed  as 
L{a)  =  ■  0.)  for  some  fixed  R  G  K.  Typically,  R  is  not  in  R  because 

Tr AT/if' (A)  yf  i?',  but  it  is  easy  to  see  that  R  G  t'R^  always,  because  if  not,  then 
Trji/K'{RR)  %  t'R'^  =  R.  For  the  purposes  of  our  field-switching  procedure,  it 
will  be  more  convenient  to  instead  work  with  corresponding  i?'-linear  functions 
from  R'^  to  R'^ ,  which  can  be  represented  in  trace  form  by  elements  of  R.  Namely, 
for  an  i?'-linear  function  L  :  R  ^  R',  where  L(a)  =  Tr/f//f/(r'^  •  a)  for  some 
R  G  t'  Ry ,  we  will  consider  the  corresponding  function 

R:  R''  ^  R'',  Ria'')  =  L{t-ay)/t'  =  TYK/K'{{tlt')R  ■  a'')  =  TvK/K'{r  ■  a''), 

®This  is  easily  seen,  e.g.,  for  R  =  and  R'  =  Z,  where  Tr(R)  =  2^~^R'  because 

Tr(l)  =  2*-l  and  Tr(C^,,)  =  0  for  j  =  1,. . .  ,2'=-!  -  1. 
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which  is  represented  by  r  =  €  R. 

Following  [16],  we  extend  the  operation  [-Jq  to  by  fixing  a  particular 
Z-basis  of  R}'  (and  Z^-basis  of  R^),  called  the  decoding  basis,  and  representing 
the  argument  as  a  Zg-combination  of  the  basis  vectors  and  applying  the  [•], 
operation  to  each  of  its  coefficients.  It  is  shown  in  [16,  Section  6.2]  that  every 
sufficiently  short  (as  always,  under  the  canonical  embedding)  e  G  R'^  is  indeed  the 
“canonical”  representative  of  its  coset  modulo  qR^ .  Specifically,  if  He]]  <  ql(2^/^ 
then  [e  mod  qR'^]q  =  e. 

2.1.5.  Good  Bases  of  R  and  R'^ 

We  now  have  almost  all  the  ingredients  we  need  to  describe  the  homomorphic 
cryptosystem  and  our  field-switching  transformation.  The  final  background  mate¬ 
rial  we  need  concerns  the  geometry  of  i?  as  a  module  over  i?'  (respectively,  R'^  as 
a  module  over  R''^).  Specifically,  we  construct  certain  “good”  bases  of  the  ring  R 
and  its  dual  R^  in  terms  of  R'  and  R''^  (respectively),  and  prove  some  of  their 
useful  geometrical  properties.  This  (somewhat  technical)  material  is  used  only  in 
Section  3.1,  where  we  prove  the  hardness  of  ring-LWE  over  K  with  secret  in  R', 
assuming  its  hardness  over  K'  with  secret  in  R' . 

Since  K  is  a  vector  space  of  dimension  n/n'  over  K' ,  the  field  K  has  a  K'  - 
basis  (which  is  not  unique),  i.e.,  a  set  of  n/n'  elements  of  K  that  are  linearly 
independent  over  K' ,  so  that  every  element  of  K  can  be  represented  uniquely  as  a 
if'-linear  combination  of  the  basis  elements.  Similarly,  an  R' -basis  of  i?  is  a  set 
of  n/n'  elements  in  R,  such  that  every  element  of  R  can  be  represented  uniquely 
as  an  i?'-linear  combination  of  the  basis  elements.  An  i?'^-basis  of  Rf  is  defined 
analogously. 

We  wish  to  construct  an  i?'-basis  of  R,  and  a  corresponding  dual  i?''^-basis 
of  Rf  (any  of  which  are  AT'-bases  of  K),  which  are  “good”  in  the  following  sense: 
for  any  vector  of  if'-coefficients  (with  respect  to  the  basis)  which  are  short  under 
a' ,  the  corresponding  A'-element  is  also  short  under  cr.  More  formally,  represent 
an  ordered  AT'-basis  of  AT  as  a  vector  b  =  (bj)  G  AT"/”  ,  and  similarly  for  an 
arbitrary  vector  of  A''-coefficients  a  =  (aj)  G  AT'^"/”  f  which  defines  the  AT-element 
a  =  (a,  b)  =  '  ^j-  Then  by  linearity,  the  basis  b  induces  a  matrix  B  G  C”^" 

such  that 


cr(a)  =  B  ■  a' (a),  where  <j'{d)  =  (^a'{aj)) ^  (4) 

We  seek  an  A'-basis  b  of  R  for  which  B  (nearly)  preserves  Euclidean  norms  up  to 
some  scaling  factor,  i.e.,  all  of  its  singular  values  are  (nearly)  equal. 

In  addition,  for  any  AT'-basis  b  =  (bj)  of  AT,  its  dual  AT'-basis  b''  =  (b/)  C  K 
is  uniquely  defined  by  the  linear  constraints  Trx/K'ibj  •  )  =  1  if  j  =  j',  and 

0  otherwise.  It  is  a  straightforward  exercise  to  verify  that  if  b  is  an  A'-basis  of 
A,  then  b'^  is  an  A'^^-basis  of  A^.  Moreover,  the  matrix  B^  induced  by  6^  is 
B^  =  so  its  singular  values  are  simply  the  inverses  of  those  of  B. 

Lemma  2.6.  Let  rfi  =  m/2  if  m  is  even  and  m'  is  odd,  otherwise  m  =  m,  and  let 
r  =  rad{m)/ rad{m')  be  the  product  of  all  primes  that  divide  m  but  not  m' .  There 
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exists  an  efficiently  computable  R' -basis  b  of  R,  for  which  the  corresponding  matrix 
B  has  largest  and  smallest  singular  values 

si{B)  =  yj  rhim'  and  sffiB)  =  \/mffirm/), 

respectively.  In  particular,  if  r  G  {1,2}  then  B  is  a  unitary  matrix  scaled  by  a 
yj m/m'  factor. 

Lemma  2.6  implies  that  for  any  a  G  ^  defining  a  =  {a,b)  G  K  and 

=  {dffi'^)  G  K, 

lk(a)||  <  \Jrhlm'  ■  ||cr'(a)||  and  ||cr(a^)||  <  \Jrm' I'm  -  ||cr'(a)||.  (5) 

More  generally,  if  the  Oj  are  independent  and  have  Gaussian  distributions  over 
(the  canonical  embedding  of)  K' ,  then  a  and  cffi  also  have  (possibly  non-spherical) 
Gaussian  distributions  over  K.^  Since  we  are  not  too  concerned  with  the  exact 
distributions,  we  omit  a  precise  calculation,  which  is  standard.  However,  one 
particular  case  of  interest  is  when  the  Oj  are  all  i.i.d.  according  to  a  spherical 
Gaussian  of  parameter  s,  and  r  G  (1,  2}  so  that  B  (respectively,  B'^)  is  a  scaled 
unitary  matrix.  Then  because  spherical  Gaussians  are  invariant  under  unitary 
transformations,  a  (resp.,  a^)  is  distributed  according  to  a  spherical  Gaussian  of 
parameter  s^fffiijm'  (resp.,  s^m' jm). 

The  remainder  of  this  subsection  is  devoted  to  proving  Lemma  2.6.  We  denote 
the  fc-dimensional  identity  matrix  by  /fc,  we  use  ®  to  denote  the  Kronecker  (or 
tensor)  product  of  vectors  and  matrices,  and  we  apply  functions  to  vectors  and 
matrices  component-wise. 

Following  the  treatment  given  in  [16],  let  m  =  be  the  prime-power 

factorization  of  m,  i.e.,  the  mi  >  1  are  powers  of  distinct  primes.  The  ring 
R  =  has  the  following  Z-basis  p,  which  is  called  the  “powerful”  basis: 

P  =  ,  where 

The  set  pme  is  called  the  “power”  Z-basis  of  Z[(fmi]  =  C  R. 

Similarly,  let  m'  =  Yh  where  each  divides  mi,  i.e.,  they  are  both  powers 
of  the  same  prime  (though  possibly  m'g  =  1).  Then  the  powerful  Z-basis  of  R'  is 
defined  as  p'  =  0^ ,  where  the  power  bases  p^'  are  defined  as  above.  Notice 
that  when  >  1,  there  is  a  bijective  correspondence  between  j  G  [(p(m^)]  and 
(j',  k)  G  [ip(m'ffi\  X  [mi/m'(\,  via  j  =  {mi/m'f)f  -\-k.  Therefore,  the  power  bases  Pm^ 
factor  as 


Pmt  =  Pm',.  G  bi,  where  bi  = 

Hence,  using  the  commutativity  of  the  Kronecker  product  (up  to  some  permutation) 
we  can  factor  the  powerful  basis  p  of  i?  as 

®To  be  completely  formal,  the  Gaussians  should  be  over  continuous  spaces  of  the  form  K  (giQ  M; 
see  [16]. 


'(D]C^ 


[Pmt  if  rn[  =  1. 
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(6) 


p  =  p'  ®h,  where  b  =  bi. 

Because  p'  is  a  Z-basis  of  R' ,  it  follows  that  b  is  an  i?'-basis  of  R.  We  next  calculate 
the  matrix  B  G  C"^"  induced  by  b,  and  verify  that  it  indeed  satisfies  the  claims 
in  the  lemma  statement. 

Following  [16,  Section  3],  for  any  prime  power  fh  we  define  CRT*  to  be  the 
complex  (^(m)-by-(/?(m)  matrix  with  in  its  ith  row  and  jth  column,  for  i  G 
and  j  G  [(/?(m)].  Using  the  prime-power  factorizations  of  our  m,m',  we  define 
CRTm  =  CRTmf  and  CRT^'  =  0^  CRT„' .  Then  up  to  a  permutation  of  the 
rows  (determined  by  the  CRT  correspondence  between  and  have 


a(p^)  =  CRT^, 


i.e.,  the  columns  of  CRT™  are  a{pj)  for  each  entry  pj  of  the  row  vector  p'^ .  In 
particular,  a{{c,p))  =  CRT^  •  c  for  any  cG  Q".  Similarly,  a'{{p'Y')  =  CRT^/  up 
to  a  row  permutation. 

We  now  claim  that,  up  to  some  permutations  of  R’s  rows  and  columns. 


B  =  CRT^  •  (*)CRT, 


-1 


*  ^n.  / r).' 


»([) 


CRT„ 


CRT 


-1 


(7) 

where  the  second  equality  follows  by  the  mixed-product  property  and  the  com¬ 
mutativity  (up  to  row  and  column  permutations)  of  the  Kronecker  product.  To 
see  the  first  equality,  notice  that  for  any  a  G  RT'l"/”  )  defining  a  =  (a,  5)  G  K, 
the  matrix  (CRT”)  0  I)  maps  from  (a  suitable  permutation  of)  the  concatenated 
embeddings  cr'(a),  to  a  vector  c  G  Z”  of  coefficients  such  that  a  =  {c,p'  0  Inin')- 
In  addition. 


a  =  {a,  b)  =  c'^  ■  {p'  (g)  /„/„/)  ■b  =  {c,p'  ®b)  =  {c,p). 

Therefore,  a{a)  =  CRT^  •  c  =  CRT^  •  (CRT”)  (g  I)  ■  a' (a),  as  desired. 

Now,  by  the  last  expression  in  Equation  (7),  and  because  singular  values 
are  multiplicative  under  the  Kronecker  product,  from  now  on  we  drop  all  the  £ 
subscripts,  and  assume  without  loss  of  generality  that  m  and  m'  are  powers  of 
the  same  prime  p  (where  possibly  m!  =  I).  We  analyze  the  singular  values  of 
CRT„(CRT”)  (g  I),  for  the  cases  m'  =  I  and  m'  >  I.  In  the  first  case,  clearly 
CRTm'  =  Ii,  and  it  is  shown  in  [16,  Section  4]  that  the  largest  singular  value  of 
CRTm  is  ■\/m/2  if  m  is  even  and  ^/m  otherwise,  and  its  smallest  singular  value  is 
yjrnjp. 

For  the  case  m'  >  1,  it  follows  from  the  decompositions  given  in  [16,  Section  3] 
that,  up  to  some  row  permutation, 

CRT„  =  a/to/p  •  Q  ■  (CRTp  (g  /^/p) 

for  some  unitary  matrix  Q,  and  similarly  for  CRT^/.  Then  a  routine  calculation 
using  elementary  properties  of  the  Kronecker  product  reveals  that  CRTm(CRT”)(g 
I)  is  some  unitary  matrix  scaled  by  a  yjmjmf  factor,  so  all  its  singular  values  are 
.  This  completes  the  proof  of  Lemma  2.6. 
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2.2.  Homomorphic  Cryptosystems 


In  ring-LWE-based  cryptosystems  for  arbitrary  cyclotomics  [16]  (generalizing  those 
of  [15,4,3]),  the  plaintext  space  is  Rp  for  some  integer  p  >2  that  is  coprime  with 
all  the  odd  primes  dividing  to.  We  assume  that  p  is  prime,  which  is  without  loss 
of  generality  by  the  Chinese  Remainder  Theorem.  Ciphertexts  are  elements  of 
(Rq)'^  for  some  integer  q  that  is  coprime  with  p,  and  the  secret  key  is  some  s  G  R. 
A  ciphertext  c  =  (co,Ci)  €  {RqY  that  encrypts  a  plaintext  b  G  Rp  with  respect  to 
s  satisfies  the  decryption  relation 


Co  +  Cl  •  s  =  e  (mod  qR'^)  (8) 

for  some  sufficiently  short  e  G  R'^  such  that  t  ■  e  =  b  (mod  *)pR.  (Recall  that 
R^  =  t~^R  for  some  t  G  R,  so  t-e  G  R.)  We  refer  to  e  as  the  noise  of  the  ciphertext. 
Throughout  this  work  we  implicitly  assume  that  the  modulus  q  is  large  enough 
relative  to  ||e||,  so  that  [cq  +  cx  ■  .s\q  =  e  G  RC  (see  Section  2.1.4  above).  Therefore, 
the  decryption  algorithm  can  simply  compute  e  and  output  t  ■  e  mod  pR.  As  shown 
in  [4,3,16],  this  system  (augmented  by  some  additional  public  values,  for  greater 
efficiency)  supports  additive  and  multiplicative  homomorphisms. 


3.  The  Field-Switching  Procedure 

Our  procedure  performs  the  following  operation.  Given  a  big-held  ciphertext 
c  G  (Rq)"^  that  encrypts  a  plaintext  b  G  Rp  with  respect  to  a  big-ring  secret  key 
s  G  R,  and  a  description  of  an  i?'-linear  function  L :  i?p  — >  to  apply  to  the 

plaintext  (where  recall  that  p  and  p'  are  the  radicals  of  p  in  i?  and  R',  respectively), 
it  outputs  a  small-held  ciphertext  c'  G  {R'qY  that  encrypts  b'  =  L{b)  G  R'^, 
with  respect  to  some  small-ring  secret  key  s'  G  R' .  (Recall  that  Corollary  2.5 
characterizes  how  L  corresponds  to  the  induced  function  L:  F-f  ^  F't  that  is 
applied  to  the  vector  of  hnite  held  elements  encoded  by  b.) 

The  procedure  consists  of  the  following  three  steps: 

1.  Switch  to  a  small-ring  secret  key.  We  use  the  key-switching  method  from  [5, 
3,16]  to  produce  a  ciphertext  which  is  still  over  the  big  held  K  and  encrypts 
the  same  plaintext  b  G  Rp,  but  with  respect  to  a  secret  key  s'  G  R'  C  R 
belonging  to  the  small  subring. 

2.  Multiply  by  an  appropriate  (short)  scalar.  We  multiply  the  components  of 
the  resulting  ciphertext  by  a  short  element  r  G  R  that  corresponds  to  the 
desired  i?'-linear  function  to  be  applied  to  the  input  plaintext  b. 

3.  Map  to  the  small  field.  We  map  the  resulting  big- held  ciphertext  (over  Rq) 
to  a  small-held  ciphertext  (over  R'^)  simply  by  taking  the  trace  Tr/^/j^/  of 
its  two  components.  The  resulting  ciphertext  will  still  be  with  respect  to  the 
small-ring  secret  key  s'  G  R',  but  will  encrypt  the  plaintext  b'  =  L{b)  G  R'^,. 

Note  that  Steps  2  and  3  can  be  repeated  multiple  times  on  the  same  ciphertext 
(from  Step  1),  to  apply  several  different  i?'-linear  functions.  In  this  way,  the  entire 
input  plaintext  can  be  preserved,  but  in  a  decomposed  form. 
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3.1.  Step  1:  Switching  to  a  Small- Ring  Secret  Key 

To  switch  to  a  small-field  secret  key,  we  publish  a  “key-switching  hint,”  which 
essentially  encrypts  the  big- ring  secret  key  s  €  R  under  the  small- ring  key  s'  G  i?', 
using  ciphertexts  over  the  big  field.  Note  that  encrypting  s  under  a  small-ring 
secret  key  s'  has  security  implications,  since  the  dimension  of  the  underlying 
RLWE  problem  is  smaller.  In  our  case,  though,  the  ultimate  goal  is  to  switch  to 
a  ciphertext  over  the  smaller  field,  so  we  will  not  lose  any  additional  security  by 
publishing  the  hint.  Indeed,  we  show  below  that  assuming  the  hardness  of  the 
decision  RLWE  problem  in  the  small  field,  the  key-switching  hint  reveals  nothing 
about  the  big-ring  secret  key.  The  essence  of  that  claim  is  Lemma  3.1  below,  which 
says  (informally)  that  RLWE  in  the  big  field,  with  secret  chosen  in  the  small  ring 
R'  C  R,  is  no  easier  than  RLWE  in  the  small  field. 

Ring-LWE.  The  ring-LWE  (RLWE)  problem  [15]  (in  K)  with  continuous  error 
is  parameterized  by  a  modulus  q,  a  “secret  distribution”  v  over  R,  and  an  “error 
distribution”  ^|)  over  K,  which  is  usually  a  Gaussian  (in  the  canonical  embedding) 
and  is  therefore  concentrated  on  short  elements.^  For  s  €  R,  define  the  distribution 
As  ,p  that  is  sampled  by  choosing  a  G  uniformly  at  random,  choosing  e  ^  -ip, 
and  outputting  the  pair  {a,  P  =  a  -  s-\-  e  mod  qR'^)  G  Rg  x  K/qR}' .  One  equivalent 
form  of  the  (average-case)  decision  RLWEq_^^„  problem  (in  K)  is,  given  some  £ 
pairs  {ai,Pi)  G  Rg  x  K/qR'^ ,  distinguish  between  the  following  two  cases:  in  one 
case,  the  pairs  are  chosen  independently  from  for  a  random  s  <—  r;  (which 
remains  the  same  for  all  samples);  in  the  other  case,  the  pairs  are  all  independent 
and  uniformly  random  over  Rg  x  K/qRE .  For  appropriate  parameters  q,  ip,  v  and 
i,  solving  this  decision  problem  with  non-negligible  distinguishing  advantage  is  as 
hard  as  approximating  the  shortest  vector  problem  on  ideal  lattices  in  R,  via  a 
quantum  reduction.  See  [15,16]  for  precise  statements  and  further  details. 

Let  b'^  =  (6y)jg[„/„/]  be  any  R'^'-basis  of  R'^ ,  and  hence  a  R'-basis  of  K.  Then 
for  any  error  distribution  ip'  over  K' ,  we  can  define  an  error  distribution  ip  over  K 
as  Ip  =  (■)/;'(”/"  \  b'^),  i.e.,  a  sample  from  ip  is  generated  by  choosing  independent 
ej  <—  Ip'  (for  j  G  [n/n'])  and  outputting  e  =  cjb'^  G  K. 

Lemma  3.1.  Let  ip'  be  an  error  distribution  over  K' ,  and  let  ip  =  (^/)'("/”  \  b'^) 
he  the  error  distribution  over  K  as  described  above.  If  the  decision  RLWEg^^i y 
problem  (in  K' )  is  hard  for  some  distribution  v'  over  R'  C  R,  then  the  decision 
RLWEg^.tjj^jjf  problem  (in  K)  is  also  hard. 

Although  the  lemma  holds  for  any  R'^'-basis  of  R'^ ,  it  is  most  useful  with  a 
basis  having  “good  geometric  properties.”  Specifically,  in  our  case  we  need  the 
property  that  if  ip'  is  concentrated  on  short  elements  of  AT',  then  ip  is  similarly 
concentrated  on  short  elements  of  K.  Such  a  basis  b'^  is  constructed  in  Lemma  2.6 
of  Section  2.1.5.  For  example,  if  ip'  is  a  continuous  (spherical)  Gaussian  with 
parameter  s  and  r  =  vadpm) /  vaApm!)  =  1,  then  ip'  is  a  spherical  Gaussian  with 
parameter  s^frnfjm  =  s^pnTfnf’ 

^ Again,  to  be  completely  formal,  a  Gaussian  should  be  defined  over  K^-,  see  Footnote  6. 

®Note  that  the  factor  ^JrEJn  <  1  does  not  really  amount  to  any  effective  decrease  in  the  noise, 
because  the  “sparsity”  of  versus  BE  is  greater  by  a  corresponding  factor. 
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Proof.  It  suffices  to  give  an  efficient,  deterministic  reduction  that  takes  n/n' 
pairs  {aj,Pj)  €  R'^  x  K' /qW'^  and  outputs  a  single  pair  (a,/3)  €  R'^  x  K/qR^ , 
with  the  following  properties:  if  the  pairs  {aj,f3j)  are  i.i.d.  according  to 
for  some  s'  G  R',  then  {a,  f3)  is  distributed  according  to  and  if  the  pairs 

{aj,Pj)  are  independent  and  uniformly  random,  then  {a,j3)  is  uniformly  random. 
The  reduction  simply  outputs  (a  =  {d,b'^),P  =  {P,b'^)),  where  a  =  {aj)j  and 

Since  b'^  is  an  R''^ -basis  of  R^  and  hence  an  R'^ -basis  of  R^,  it  is  immediate 
that  the  reduction  maps  the  uniform  distribution  to  the  uniform  distribution.  On 
the  other  hand,  if  the  samples  (oj/Sj)  are  drawn  from  As>^.,p>,  i.e,  /3j  =  aj  •  s'  + 
ej  mod  qR''^  for  ej  <—  ip,  then  a  is  still  uniformly  random,  and 

(3  =  0,  h'^)  =  (a,  h'^)  ■  s'  +  (e,  5^)  =  a  ■  s'  -\-  e  (mod  qRP^), 
where  e  =  {ej)j  and  e  has  distribution  ip.  This  completes  the  proof.  □ 

Key  switching.  In  [5,3,16]  it  is  shown  how,  given  an  s  G  i?  and  sufficiently  many 
RLWE  samples  (over  K)  with  short  noise  and  any  secret  s'  G  R,  it  is  possible  to 
generate  a  “key-switching  hint”  with  the  following  functionality:  given  the  hint 
and  any  valid  ciphertext  c  (over  K)  encrypted  under  s  and  with  sufficiently  short 
noise,  it  is  possible  to  efficiently  generate  a  ciphertext  c'  (also  over  K)  with  short 
noise  encrypted  under  s' .  Moreover,  the  hint  is  indistinguishable  from  uniformly 
random  over  its  domain  (even  given  s),  assuming  that  the  RLWE  samples  are. 

For  our  transformation,  we  apply  Lemma  3.1  using  the  “good  basis”  b'^ 
from  Lemma  2.6,  thus  obtaining  RLWE  samples  over  K  relative  to  the  secret 
s'  G  R'  C  R,  with  noise  distribution  ip  which  is  concentrated  on  short  vectors, 
and  with  security  based  on  the  hardness  of  RLWEq_^/_^,/  problem  in  K' .  We 
then  construct  the  key-switching  hint  from  these  samples  as  described  in  [16, 
Section  8.3], 

3.2.  Steps  2  and  3:  Mapping  to  the  Small  Field 

Our  goal  now  is  to  transform  a  valid  big-field  ciphertext  c  =  (co,ci)  G  (Rg)^, 
which  encrypts  some  b  G  Rp  with  respect  to  some  secret  key  s'  G  R'  Q  R,  into 
a  small-field  ciphertext  c'  =  (cg,c()  G  {R'gY  that  encrypts  the  related  plaintext 
b'  =  L{b)  with  respect  to  the  same  secret  key  s',  where  L:  Rp  ^  i?p,  is  any  desired 
i?'-linear  function. 

The  process  works  as  follows: 

1.  Since  L  is  i?'-linear,  by  the  discussion  at  the  end  of  Section  2.1.3  and  in 

Section  2.1.4,  we  can  find  some  G  t'R''  such  that  L{a)  =  • 

a)  mod  p'. 

2.  We  then  find  a  short  representative  r  G  {t/t')r'^  -\-pR  G  Rp,  using  a  “good” 
basis  of  pR  (i.e.,  one  that  has  small  singular  values  under  a,  e.g.,  the 
“powerful”  basis  as  constructed  in  Section  2.1.5). 

The  chosen  r  defines  the  i?'-linear  function  :  R'^  — >  R''^  of  the  form 
LA  {aP)  =  Fvx/K’if '  whose  induced  function  from  to  R'^,  satisfies 
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t' ■  L'^ {a^ )  =  L{t  ■  )  (mod  p').  (9) 

3.  We  obtain  our  small- field  ciphertext  by  applying  (or  more  precisely,  the 
induced  function  from  R'^  to  R'^)  to  cq,  ci,  setting 

c'  =  L'^ia)  =  TrK/K'{r  ■  c.)  €  ^  =  0, 1. 

Lemma  3.2.  The  ciphertext  c'  =  (cq,c()  is  an  encryption  of  b'  =  L{b)  G  R!^,  under 
secret  key  s'  G  i?',  with  noise  e'  =  L'^{e)  G  R''^  of  length  ||e'||  <  ||e||  •  ||r||oo-  ^Jnjri! , 
where  e  is  the  noise  in  the  original  ciphertext  c. 

We  note  that  the  factor  ^njn'  in  the  bound  on  ||e'||  does  not  actually  amount 
to  any  effective  increase  in  the  noise,  because  the  dimension  has  decreased  by  a 
corresponding  factor,  and  hence  the  size  of  e'  relative  to  R!'^  remains  the  same  as 
that  of  e  relative  to  i?'' .  More  precisely,  the  original  ciphertext  c  decrypts  correctly 
if  <7  >  2-y/n||e||,  whereas  c'  decrypts  correctly  ii  q  >  2-\/n'||e'||  (see  Section  2.1.4). 
Therefore,  the  only  practical  increase  in  the  noise  is  due  solely  to  HrHoo- 

Proof.  We  need  to  show  three  things:  that  ||e'||  is  bounded  as  claimed,  that 
Cq  -I-  c'l  •  s  =  e'  (mod  and  that  t'  ■  e'  =  b'  =  L{b)  (mod  *)p'. 

1.  The  first  claim  follows  immediately  by  Corollary  2.2  and  the  inequality 
l|r- ell  <||r|U -Hell. 

2.  For  the  second  claim,  recall  that  cq  -f  ci  •  s  =  e  (mod  *)qR'^ .  Then  because 
the  induced  function  L'^ :  R^  R'^  is  i?'-linear  and  s'  G  R',  we  have 

Cq  -I-  c'l  ■  s'  =  L^(co  +  Cl  •  s')  =  L^(e)  =  e!  (mod 

3.  For  the  last  claim,  because  t  •  e  =  b  mod  pR  and  by  Equation  (9),  we  have 

t'  ■  e'  =  t'  ■  L'^ (e)  =  L{t  ■  e)  =  L{b)  (mod  p').  □ 


3.3.  Applying  the  Field- Switching  Procedure 

A  typical  application  of  the  field-switching  procedure  during  homomorphic  evalua¬ 
tion  of  some  circuit  will  begin  with  a  big-held  ciphertext  that  encrypts  an  array  of 
plaintext  values  in  the  subheld  F',  as  embedded  in  F.®  The  above  procedure  is  then 
applied  to  decompose  the  ciphertext  into  a  number  of  small-held  ciphertexts,  each 
encrypting  a  subset  of  the  plaintext  values.  Since  big-held  ciphertexts  have  room 
for  /  plaintext  elements,  but  small-held  ciphertexts  can  only  hold  /'  elements,  we 
may  need  up  to  ///'  small-held  ciphertexts  to  hold  all  the  plaintext  values  that 
we  are  interested  in.  That  is,  we  apply  our  held-switching  transformation  using 
the  /'-fold  concatenations  L{  of  the  F'-linear  selection  functions  Zg  F^/-^  ^  F', 
i  G  [///'],  where  Li  just  selects  the  fth  value  (in  F').^° 

®For  example,  when  evaluating  AES  homomorphically,  we  would  have  plaintext  values  from 
F2S  even  though  F  may  be  a  larger  field  such  as  F2I6  or  F224,  etc. 

^®More  precisely,  Li{a)  =  Trp/p/fp  ■  a^)  for  some  p  £  F  such  that  Trj'/j'/fp)  =  1,  so  that 
Li{a)  =  Ui  for  any  a;  £  F',  by  F'-linearity. 
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Referring  to  Figure  1  for  an  example,  the  big-held  ciphertext  holds  (up  to) 
six  plaintext  values,  and  each  small-held  ciphertext  can  hold  two  values,  with 
the  big-held  plaintext  “slots”  corresponding  to  pi,pi5,p22  lying  over  the  small- 
held  plaintext  slot  of  p^,  and  the  big-held  slots  corresponding  to  p3,pi7,p3i  lying 
over  the  small-held  plaintext  slot  of  P3.  Then  we  can  produce  three  small-held 
ciphertexts,  using  the  three  selection  functions 

{xi,Xi^,X22,  X3,Xn,Xzi)  ^  {xi  ,  X3), 

(xi,a;i5,X22,  a:3,a;i7,X3i)  1-^  (a;i5  ,  a;i7), 

(xi,a;i5,a;22,  X3,xn,xzi)  ^  (3^22  ,  3:31)- 
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Abstract.  We  describe  a  method  to  bootstrap  a  packed  BGV  ciphertext  which 
does  not  depend  (as  much)  on  any  special  properties  of  the  plaintext  and  cipher- 
text  moduli.  Prior  “efficient”  methods  such  as  that  of  Gentry  et  al  (PKC  2012) 
required  a  ciphertext  modulus  q  which  was  close  to  a  power  of  the  plaintext  mod¬ 
ulus  p.  This  enables  our  method  to  be  applied  in  a  larger  number  of  situations. 
Also  unlike  previous  methods  our  depth  grows  only  as  0(logp-|-log  log  q)  as  op¬ 
posed  to  the  log  q  of  previous  methods.  Our  basic  bootstrapping  technique  makes 
use  of  a  representation  of  the  group  over  the  finite  field  Fp  (either  based  on 
polynomials  or  elliptic  curves),  followed  by  polynomial  interpolation  of  the  re¬ 
duction  mod  p  map  over  the  coefficients  of  the  algebraic  group. 

This  technique  is  then  extended  to  the  full  BGV  packed  ciphertext  space,  using  a 
method  whose  depth  depends  only  logarithmically  on  the  number  of  packed  ele¬ 
ments.  This  method  may  be  of  interest  as  an  alternative  to  the  method  of  Alperin- 
Sheriff  and  Peikert  (CRYPTO  2013).  To  aid  efficiency  we  utilize  the  ring/field 
switching  technique  of  Gentry  et  al  (SCN  2012,  JCS  2013). 


1  Introduction 

Since  the  invention  of  Fully  Homomorphic  Encryption  (FHE)  by  Gentry  in  2009  [14,15], 
one  of  the  main  open  questions  in  the  field  has  been  how  to  “bootstrap”  a  Somewhat 
Homomorphic  Encryption  (SHE)  scheme  into  a  FHE  scheme.  Recall  an  SHE  scheme 
is  one  which  can  evaluate  circuits  of  a  limited  multiplicative  depth,  whereas  an  FHE 
scheme  is  one  which  can  evaluate  circuits  of  arbitrary  depth.  Gentry’s  bootstrapping 
technique  is  the  only  known  way  of  obtaining  unbounded  FHE. 

The  ciphertexts  of  all  known  SHE  schemes  include  some  noise  to  ensure  security, 
and  unfortunately  this  noise  grows  as  more  and  more  homomorphic  operations  are  per¬ 
formed,  until  it  is  so  large  that  the  ciphertext  will  no  longer  decrypt  correctly.  In  a 
nutshell,  bootstrapping  “refreshes”  a  ciphertext  that  can  not  support  any  further  homo¬ 
morphic  operation  by  homomorphically  decrypting  it,  and  obtaining  in  this  way  a  new 
encryption  of  the  some  plaintext,  but  with  smaller  noise.  This  is  possible  if  the  under¬ 
lying  SHE  scheme  has  enough  homomorphic  capacity  to  evaluate  its  own  decryption 
algorithm.  Bootstrapping  is  computationally  very  expensive  and  it  represents  the  main 
bottleneck  in  FHE  constructions. 
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Several  SHE  schemes,  with  different  bootstrapping  procedures,  have  been  proposed 
in  the  past  few  years  [1,2,4,6,7,8,14,15,10,18,19,32].  The  most  efficient  are  ones  which 
allow  SIMD  style  operations,  by  packing  a  number  of  plaintext  elements  into  indepen¬ 
dent  “slots”  in  the  plaintext  space.  The  most  studied  of  such  “SIMD  friendly”  schemes 
being  the  BGV  scheme  [5]  based  on  the  Ring-LWE  Problem  [25]. 


Prior  Work  on  Bootstrapping.  In  almost  all  the  SHE  schemes  supporting  bootstrap¬ 
ping,  decryption  is  performed  by  evaluating  some  linear  function  D,  dependent  on  the 
ciphertext  c,  on  the  secret  key  st  modulo  some  integer  q,  and  then  reducing  the  re¬ 
sult  modulo  some  prime  p,  i.e.  dec(c, s6)  =  {{Dc{st)  mod  q)  mod  p).  Given  an 
encryption  of  the  secret  key,  bootstrapping  consists  in  evaluating  the  above  decryption 
formula  homomorphically.  One  can  divide  the  bootstrapping  of  all  efficient  currently 
known  SHE  schemes  into  three  distinct  sub-problems. 

1.  The  first  problem  is  to  homomorphically  evaluate  the  reduction  (mod  p)-map  on 

the  group  Z+  (see  Eig.  1),  where  for  the  domain  one  takes  representatives  centered 
around  zero.  To  do  this  the  group  is  first  mapped  to  a  set  G  in  which  one  can 
perform  operations  native  to  the  homomorphic  cryptosystem.  In  other  words  we 
first  need  to  specify  a  representation,  rep  :  — >  G,  which  takes  an  integer  in 

the  range  {—qj^, . . . ,  <7/2]  and  maps  it  to  the  set  G.  The  group  operation  on 
needs  to  induce  a  group  operation  on  G  which  can  be  evaluated  homomorphically 
by  the  underlying  SHE  scheme.  Then  we  describe  the  induced  map  red  :  G  — >  Zp 
as  a  algebraic  operation,  which  can  hence  be  evaluated  homomorphically. 

2.  The  second  problem  is  to  encode  the  secret  key  in  a  way  that  one  can  publicly, 
using  a  function  dec-eval  (decryption  evaluation),  create  a  set  of  ciphertexts  which 
encrypt  the  required  input  to  the  function  red. 

3.  And  thirdly  one  needs  a  method  to  extend  this  to  packed  ciphertexts. 


To  solidify  ideas  we  now  expand  on  these  problems  in  the  context  of  the  BGV  scheme 
[5].  Recall  for  BGV  we  have  a  set  of  L  4-  1  moduli,  corresponding  to  the  levels  of  the 
scheme,  q^  <  q\  <  . . .  <  qj^,  and  a  (global)  ring  R,  which  is  often  the  ring  of  integers 
of  a  cyclotomic  number  field.  We  let  p  denote  the  (prime)  plaintext  modulus,  i.e.  the 
plaintexts  will  be  elements  in  Rp  (the  localisation  of  R  at  the  prime  p),  and  to  ease 
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notation  we  set  q  =  qq.  The  secret  key  st  is  a  small  element  in  R.  A  “fresh”  ciphertext 
encrypting  G  Rp  is  an  element  ct'  =  (cg,  c^)  in  R^^  such  that 

(cg  +  •  c'l  (mod  ql))  (mod  p)  =  . 

After  the  evaluation  of  L  levels  of  multiplication  one  obtains  a  ciphertext  ct  =  (cq,  ci) 
in  R^,  encrypting  a  plaintext  p,  such  that 

(cg  +  s£  •  Cl  (mod  q))  (mod  p)  =  p. 

At  this  point  to  perform  further  calculations  one  needs  to  bootstrap,  or  recrypt,  the 
ciphertext  to  one  of  a  higher  level. 

Assume  for  the  moment  that  each  plaintext  only  encodes  a  single  element  of  Zp,  i.e. 
each  plaintext  is  a  constant  polynomial  in  polynomial  basis  for  Rp.  To  perform  boot¬ 
strapping  we  need  to  place  a  “hint”  in  the  public  key  pt  (usually  an  encryption  of  s£  at 
level  L),  which  allows  the  following  operations.  Firstly,  we  can  evaluate  homomorphi- 
cally  a  function  dec-eval  which  takes  ct  and  the  “hint”,  and  outputs  a  representation  of 
the  Zq  element  corresponding  to  the  constant  term  of  the  element  cq  -f  s6  •  ci  (mod  q) . 
This  representation  is  an  encryption  of  an  element  in  G,  i.e.  dec-eval  also  evaluates  the 
rep  map  as  well  as  the  decryption  map.  Then  we  apply,  homomorphic  ally,  the  function 
red  to  this  representation  to  obtain  a  fresh  encryption  of  the  plaintext.  Since  to  homo- 
morphically  evaluate  red  we  need  the  input  to  red  to  be  defined  over  the  plaintext  space, 
this  means  the  representation  of  Zg  must  be  defined  over  Fp.  One  is  then  left  with  the 
task  of  extending  such  a  procedure  to  packed  ciphertexts. 

In  the  original  bootstrapping  technique  of  Gentry  [15],  implemented  in  [16],  the 
function  dec-eval  is  obtained  from  a  process  of  bit-decomposition.  Thus  the  represen¬ 
tation  G  of  Zq  is  the  bit-representation  of  an  integer  in  the  range  {—q/2, . . . ,  q/2],  i.e. 
we  use  a  representation  defined  over  F2.  The  function  to  evaluate  red  is  then  the  circuit 
which  performs  reduction  modulo  p.  The  extension  of  this  technique  to  packed  cipher- 
texts,  in  the  context  of  the  Smart- Vercauteren  SIMD  optimisations  [29]  of  Gentry’s 
SHE  scheme,  was  given  in  [30].  Due  to  the  use  of  bit-decomposition  techniques  this 
method  is  mainly  suited  to  the  case  ofp  =  2,  although  one  can  extend  it  to  other  primes 
by  applying  a  p-adic  decomposition  and  then  using  an  arithmetic  circuit  to  evaluate  the 
reduction  modulo  p  map. 

In  [18]  the  authors  present  a  bootstrapping  technique,  primarily  targeted  at  the  BGV 
scheme,  which  does  away  with  the  need  for  evaluating  the  “standard”  circuit  for  the  re¬ 
duction  modulo  p  map.  This  is  done  by  choosing  q  close  to  a  power  of  p,  i.e.  one  selects 
q  =  p^  ±  a  for  some  t  and  a  small  value  of  a,  typically  a  G  {—1,1}.  The  paper  [18] 
expands  on  this  idea  for  the  case  of  p  =  2,  but  the  authors  mention  it  can  be  clearly 
extended  to  arbitrary  p.  The  advantage  is  that  the  mapping  red  can  now  be  expressed 
as  algebraic  formulae;  in  fact  formulae  of  multiplicative  depth  log2  q.  The  operation 
dec-eval  obtains  the  required  representation  for  Zq  by  mapping  it  into  Zpt+i.  The  re¬ 
sulting  technique  requires  the  extension  of  the  modulus  of  the  plaintext  ring  to  p*+^ 
(for  which  all  the  required  properties  of  Rp  carry  over,  assuming  that  p  does  not  ram¬ 
ify).  The  extension  to  packed  ciphertexts  is  performed  using  an  elaborate  homomorphic 
evaluation  of  the  Fourier  Transform. 
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To  enable  the  faster  evaluation  of  this  Fourier  Transform  step  from  [18],  a  method 
for  ring/field  switching  is  presented  in  [17].  The  technique  of  ring/field  switching  also 
enables  general  improvements  in  efficiency  as  ciphertext  noise  grows.  This  enables  the 
ring  R  to  be  changed  to  a  sub-ring  S  (both  for  the  ciphertext  and  plaintext  spaces).  In 
[1]  this  use  of  field  switching  is  combined  with  the  red  map  from  [18]  to  obtain  an 
asymptotically  efficient  bootstrapping  method  for  BGV  style  SHE  schemes;  although 
the  resulting  technique  does  not  fully  map  to  our  blueprint,  as  q  =  p"  for  some  value  of 
V.  In  [28]  this  method  is  implemented,  with  surprisingly  efficient  runtimes,  for  the  case 
of  plaintext  space  F2;  i.e.  p  =  2  and  no  plaintext  SIMD-packing  is  supported. 

In  another  line  of  work,  the  authors  of  [2]  and  [8]  present  a  bootstrapping  technique 
for  the  GSW  [21]  homomorphic  encryption  scheme.  The  GSW  scheme  is  one  based 
on  matrices,  and  this  property  is  exploited  in  [2]  by  taking  a  matrix  representation  of 
Zq  and  then  expressing  the  map  red  via  a  very  simple  algebraic  relationship  on  the 
associated  matrices.  In  particular  the  authors  represent  elements  of  Zq  by  matrices  (of 
some  large  dimension)  over  Fp. 

Thus  we  see  almost  all  bootstrapping  techniques  require  us  to  come  up  with  a  rep¬ 
resentation  G  of  Zq  for  which  there  is  an  algebraic  method  over  Fp  to  evaluate  the 
induced  mapping  red,  from  the  said  representation  of  Zq,  to  Zp.  Since  SHE  schemes 
usually  homomorphic  ally  have  add  and  multiply  operations  as  their  basic  homomorphic 
operations,  this  implies  we  are  looking  for  representations  of  as  a  subgroup  of  an 
algebraic  group  over  Fp. 

Our  Contribution.  We  return  to  consider  the  Ring-LWE  based  BGV  scheme,  and  we 
present  a  new  bootstrapping  technique  with  small  depth  growth,  compared  with  previ¬ 
ous  methods,  and  which  supports  a  larger  choice  of  p  and  q.  Instead  of  concentrating 
on  the  case  of  plaintext  moduli  p  such  that  a  power  of  p  is  close  to  q,  we  look  at  a  much 
larger  class  of  plaintext  moduli.  Recall  the  most  efficient  prior  technique,  based  on  [1] 
and  [18],  requires  a  method  whose  multiplicative  depth  is  0(log  q),  and  for  which  q  is 
close  to  a  power  of  p.  As  p  increases  the  ability  to  select  a  suitable  modulus  q  which  is 
both  close  to  a  power  of  p,  is  of  the  correct  size  for  most  efficient  implementation  (i.e. 
the  smallest  needed  to  ensure  security),  and  has  other  properties  related  to  efficiency 
(i.e.  the  ring  Rq  has  a  double-CRT  representation  as  in  [20])  diminishes. 

To  allow  a  wider  selection  for  p  we  utilize  two  “new”  (for  bootstrapping)  represen¬ 
tations  of  the  ring  Zq,  in  much  the  same  way  as  [2]  used  an  Fp-matrix  representation 
(a.k.a.  a  linear  algebraic  group)  of  Z+.  The  first  one,  used  for  much  of  this  paper  for 
ease  of  presentation,  is  based  on  a  polynomial  representation  for  Z+  over  Fp,  the  sec¬ 
ond  one  (which  is  less  efficient  but  allows  a  greater  freedom  in  selecting  q)  is  based 
on  a  representation  via  elliptic  curves.  The  evaluation  of  the  mapping  red  using  these 
representations  can  then  be  done  in  expected  multiplicative  depth  O  (log  p  +  log  log  q) , 
i.e.  a  much  shallower  circuit  than  used  in  prior  works,  using  polynomial  interpolation 
of  the  red  map  over  the  coefficients  of  the  algebraic  group. 

To  ensure  this  method  works,  and  is  efficient,  we  do  not  have  completely  free  reign 
in  selecting  q  for  the  first  polynomial  representation.  Whilst  [18]  required  q  =  p*  zL  a, 
for  a  small  value  of  a,  we  instead  will  require  that  q  divides 

Icm  -  1)  , 
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for  some  pairwise  co-prime  values  ki.  Even  with  this  restriction,  the  freedom  on  select¬ 
ing  q  is  much  greater  than  for  the  method  in  [18],  especially  for  large  values  of  p.  In 
the  second  representation,  described  in  Section  7,  we  simply  need  to  find  elliptic  curves 
over  Fpfcj  whose  group  order  is  divisible  by  Ci  where  H  6*  =  9'  P®''  elliptic  curve 
based  version  we  do  not  need  pairwise  co-prime  values  of  ki.  Indeed  on  setting  t  =  I 
we  simply  need  one  curve  E{¥pki )  whose  group  order  is  divisible  by  q,  which  is  highly 
likely  to  exist,  since  p  ^  q,  hy  the  near  uniform  distribution  of  elliptic  curve  group 
orders  in  the  Hasse  interval. 

Note  also  that,  in  the  polynomial  representation,  one  does  not  have  complete  free¬ 
dom  on  selecting  the  ki  values.  If  we  let  i?  =  ^  ki  and  M  =  ^J2ki-{ki  +  l)  then  the 
depth  of  the  circuit  (which  is  approximately  log2  log2  q  —  log2  log2  E)  to  evaluate  red 
will  decrease  as  E  grows,  but  the  number  of  multiplications  required,  which  is  a  mono- 
tonically  increasing  function  of  M,  will  increase.  Note,  we  can  asymptotically  make 
M  —  0(^  ki  ■  log  ki)  using  EFT  techniques,  or  M  =  using  Karatsuba 

based  techniques,  but  in  practice  the  ki  will  be  too  small  to  make  such  optimization 
fruitful.  For  the  elliptic  curve  based  version  we  replace  the  above  Ehy  E  +  1  and  we 
replace  M  by  a  constant  multiple  of  M.  However,  the  depth  required  by  our  elliptic 
curve  based  version  increases. 

Our  method  permits  to  bootstrap  a  certain  number  of  packed  ciphertexts  in  parallel, 
using  a  form  of  p-adic  decomposition  and  a  matrix  representation  of  the  ciphertext  ring, 
combined  with  ring  switching.  The  resulting  depth  depends  only  logarithmically  on  the 
number  of  packed  ciphertexts. 

Overview  and  paper  organization.  Here  we  give  a  brief  overview  of  the  paper.  In 
Section  2  and  3  we  recall  the  basic  algebraic  background  required  for  our  construction, 
and  the  BGV  SHE  scheme  from  [5],  respectively.  Typically,  the  main  technical  difficult 
in  bootstrapping  is  to  homomorphically  evaluate  in  a  efficient  way  the  (mod  p)-map 
on  the  group  Z+.  In  Section  4  we  describe  a  simple  way  to  evaluate  the  (mod  p)-map 
using  a  polynomial  representation  of  the  group  G  in  Fig.  1 .  In  Section  5  we  prepare  to 
bootstrap  packed  ciphertexts  and  we  show  how  to  homomorphically  evaluate  a  product 
of  powers  of  SIMD  vectors.  In  particular  we  calculate  the  depth  and  the  number  of 
multiplications  required  to  compute  this  operation.  Finally,  in  Section  6  we  show  how 
to  bootstrap  BGV  ciphertexts.  We  use  a  matrix  representation  of  the  product  of  two 
elements  in  a  ring  and  a  single  ring  switching  step  in  such  a  way  that  we  can  bootstrap 
a  number,  say  C,  of  packed  ciphertexts  in  one  step.  We  describe  the  homomorphic 
evaluation  of  the  decryption  equation  using  the  SIMD  evaluation  of  the  maps  red  and 
rep.  Using  the  calculation  of  Section  5,  we  can  compute  the  depth  and  the  number  of 
multiplications  necessary  to  bootstrap  C  packed  ciphertexts  in  parallel.  In  Section  7  we 
give  a  different  instantiation  of  our  method  using  elliptic  curves. 

2  Preliminaries 

Throughout  this  work  vectors  are  written  using  bold  lower-case  letters,  whereas  bold 
upper-case  letters  are  used  for  matrices.  We  denote  by  Maxb{K)  the  set  of  a  x  6  di¬ 
mensional  matrices  with  entries  in  K.  For  an  integer  modulus  q,  we  let  Zq  =  Z/qZ 
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denote  the  quotient  ring  of  integers  modulo  q,  and  its  additive  group.  This  notation 
naturally  extends  to  the  localisation  Rq  of  a  ring  R  at  q. 

2.1  Algebraic  Background 

Let  m  be  a  positive  integer  we  define  the  mth  cyclotomic  field  to  be  the  field  K  = 
Q\X]/<Prn{X),  where  <Pm{X)  is  the  mth  cyclotomic  polynomial.  <Pm{X)  is  a  monic 
irreducible  polynomial  over  the  rational,  and  K  is  a  field  extension  of  degree  N  =  (j){rn) 
over  Q  since  <Pm{X)  has  degree  N .  Let  Cm  be  an  abstract  primitive  mth  roots  of 
unity,  we  have  that  K  =  Q(Cm)  by  identifying  Cm  with  X.  In  the  same  way,  let 
us  denote  by  R  the  mth  cyclotomic  ring  Z[Cm]  —  ^[A]/<?m(A),  with  “power  ba¬ 
sis”  {1,  Cmj  •  ■  • ;  Cm~^}-  The  complex  embeddings  of  K  are  Ci  :  K  ^  C,  defined  by 
ai{X)  =  Cm>  *  G  ^m-  particular  K  is  Galois  over  Q  and  Gal(Q(Cm)/Q)  —  ^m- 
a  consequence  we  can  define  the  Q-linear  (field)  trace  Trjj/Q  :  K  ^  Q  as  the  sum  of 
the  embeddings  ai,  i.e.  TrK/Q(a)  =  G  Q.  Concretely,  these  embeddings 

map  Cm  into  each  of  its  conjugates,  and  they  are  the  only  field  homomorphisms  from 
K  to  C  that  fix  every  element  of  Q.  The  canonical  embedding  cr  :  K  ^  is  the 
concatenation  of  all  the  complex  embeddings,  i.e.  a{a)  =  ,  a  G  K. 

Looking  ahead,  we  will  use  the  ring  R  and  its  localisation  Rq,  for  some  modulus  q. 
Given  a  polynomial  a  G  i?,  we  denote  by  ||a||oo  =  maxo<j<Ar-i  la^j  the  standard  l^o- 
norm.  All  estimates  of  noise  are  taken  with  respect  to  the  canonical  embedding  norm 
ll®||“"  =  ||'7(a)  Iloo,  a  G  R.  When  considering  short  elements  in  Rq,  we  define  short  in 
terms  of  the  following  quantity: 

|a|“"  =  min{||a'||“''  :  a'  G  R  and  a'  =  a  mod  q}. 

To  map  from  norms  in  the  canonical  embedding  to  norms  on  the  coefficients  of  the 
polynomial  defining  the  elements  of  R,  we  have  ||a||oo  <  Cm  •  ||a||“".  where  Cm  is  the 
ring  constant.  For  more  details  about  Cm  see  [13].  Note,  if  the  dual  basis  techniques 
of  [26]  are  used,  then  one  can  remove  the  dependence  on  c^-  However,  for  ease  of 
exposition  we  shall  use  only  polynomial  basis  in  this  work. 

Let  m'  be  a  positive  integer  such  that  m'|m.  As  before  we  define  K'  =  Q(Cm' )  and 
S  =  Z[Cm'],  such  that  K'  has  degree  n  =  over  Q  and  Gal(K'/Q)  =  It  is 

trivial  to  show  that  K  and  i?  are  a  field  and  a  ring  extension  of  K'  and  R',  respectively, 
both  of  dimension  N/n.  In  particular  we  can  see  S'  as  a  subring  of  R  via  the  ring 
embedding  that  maps  Cm'  Cm^™  ■ 

It  is  a  standard  fact  that  if  Q  C  K'  C  K  is  a  tower  of  number  field,  then  Tri[j/Q(a)  = 
TrjK' /Q(TrK/K'  (a)),  and  that  all  the  K'-linear  maps  L  :  K  ^  K'  are  exactly  the  maps  of 
the  form  Tr^/K/ (r  •  a),  for  some  r  G  K. 

2.2  Plaintext  Slots 

Let  p  be  a  prime  integer,  coprime  to  m,  and  Rp  the  localisation  of  R  at  p.  The  polyno¬ 
mial  <d>m{X)  factors  modulo  p  into  irreducible  factors,  i.e.  <Pm{X)  =  Yli=i  Ti(A) 
(mod  p).  Each  Fi{X)  has  degree  where  is  the  multiplicative 
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order  of  p  in  Looking  ahead,  each  of  these  factors  corresponds  to  a  “plaintext 
slot”, i.e. 


Rp  ^  Zp[X]/Fi{X)  X  •  •  •  X  Zp[X]/Fpn){X)  ^ 

More  precisely,  we  have  isomorphisms  ipi  :  'Lp[X]/  Fi{X)  ,i  =  1  >  •  ■  •  >  , 

that  allow  to  represent  plaintext  elements  of  Fp(d)  as  a  single  element  in  Rp.  By 
the  Chinese  Remainder  Theorem,  addition  and  multiplication  correspond  to  SIMD  op¬ 
erations  on  the  slots  and  this  allows  to  process  input  values  at  once. 


2.3  Ring  Switching 

As  mentioned  in  the  introduction,  our  technique  uses  a  method  for  ring/field  switching 
from  [17]  so  as  to  aid  efficiency.  We  use  two  different  cyclotomic  rings  R  and  S  such 
that  SCR.  This  procedure  permits  to  transform  a  ciphertext  ct  €  (Rq)^  corresponding 
to  a  plaintext  p  G  Rp  with  respect  to  a  secret  key  s£  G  R,  into  a  ciphertext  ct'  G  (Sg)^ 
corresponding  to  a  plaintext  p!  G  Sp  with  respect  to  a  secret  key  st'  G  S.  The  security 
of  this  method  relies  on  the  hardness  of  the  ring-LWE  problem  in  S  ([25]).  At  a  high 
level  the  ring  switching  consists  of  three  steps.  Given  an  input  ciphertext  ct  €  (Rq)^'. 

-  First,  it  switches  the  secret  key;  it  uses  the  “classical”  key-switching  ([6], [5]),  get¬ 
ting  a  ciphertext  ct  G  {RqY,  still  encrypting  p  G  Rp,  but  with  respect  to  a  secret 
key  st'  G  S. 

-  Second,  it  multiplies  ct  by  a  fixed  element  r  G  R,  which  is  determined  by  a  S- 
linear  function  L  :  Rp  ^  Sp  corresponding  to  the  induced  projection  function 
P  ■  (IFpdCH)  ^  (IFpdCs)  (see  [17]  for  details). 

-  Finally,  it  applies  to  ct  the  trace  function  Tr^./^  :  i?  — >  S'.  In  such  a  way  the  output 
of  the  ring-switching  is  a  ciphertext  ct  G  S  with  respect  to  the  secret  key  st'  and 
encrypting  the  plaintext  p'  =  L{p). 

We  conclude  this  section  noting  that,  while  big-ring  ciphertexts  correspond  to  f 
plaintext  slots,  small-ring  ciphertexts  only  correspond  to  <  £^^'>  plaintext  slots. 
The  input  ciphertexts  to  our  bootstrapping  procedure  are  defined  over  (Sg)^,  and  so 
are  of  degree  n  and  contain  £^^'>  slots.  We  take  £^^'>  jn  of  these  ciphertexts  and  use  the 
dec-eval  map  to  encode  the  coefficients  of  the  plaintext  polynomials  in  the  slots  of  a 
single  big-ring  ciphertext.  Eventually,  via  ring  switching  and  polynomial  interpolation, 
we  return  to  jn  ciphertexts  which  have  been  bootstrapped  and  are  at  level  one 
(or  more).  These  fresh  ciphertexts  may  be  defined  over  the  big  ring  or  the  small  ring 
(depending  when  ring  switching  occurs).  However,  our  parameter  estimates  imply  that 
ring  switching  is  best  performed  at  the  lowest  level  possible,  and  so  our  bootstrapped 
ciphertexts  will  be  in  the  big  ring.  We  could  encode  all  of  the  slots  of  the  bootstrapped 
ciphertexts  in  a  big-ring  single  ciphertext,  or  not,  depending  on  the  application,  since 
slot  manipulation  is  a  linear  operation. 
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3  The  BGV  Somewhat  Homomorphic  Encryption  Scheme 


In  this  section  we  outline  what  we  need  about  the  BGV  SHE  scheme  [5].  As  anticipated 
in  Section  2,  we  present  the  scheme  with  the  option  of  utilizing  two  rings,  and  hence  at 
some  point  we  will  make  use  of  the  ring/field  switching  procedure  from  [17].  We  first 
define  two  rings  R  =  'L[X\/F{X)  and  S  =  'L[X\/ f{X),  where  F{X)  (resp.  f{x)) 
is  an  irreducible  polynomial  over  Z  of  degree  N  (resp.  n).  In  practice  both  F{X)  and 
f{X)  will  likely  be  cyclotomic  polynomials.  We  assume  that  n  divides  N,  and  so  here  is 
an  embedding  l  :  S  — >  R  which  maps  elements  in  S  to  their  appropriate  equivalent  in 
R.  The  map  l  can  be  expressed  as  a  linear  mapping  on  the  coefficients  of  the  polynomial 
representation  of  the  elements  in  S,  to  the  coefficients  of  the  polynomial  representation 
of  the  elements  in  R.  In  this  way  we  can  consider  S'  to  be  a  subring  of  R. 

Let  Rq  (resp.  Sg)  denote  the  localisation  of  R  (resp  S)  at  q,  i.e.  'Lq\X]/ F{X)  (resp. 
Zq[X]/ f{X)),  which  can  be  constructed  for  any  positive  integer  q.  Let  p  be  a  prime 
number,  which  does  not  ramify  in  either  R  or  S.  Since  the  rings  are  Galois,  the  ring  Rp 
(resp.  Sp)  splits  into  (resp.  “slots”;  with  each  slot  being  a  finite  field  extension 
of  Fp  of  degree  (resp.  =  n/£^^''>).  We  make  the  assumption  that  n 

divides  £^^\  This  is  not  strictly  necessary  but  it  ensures  that  we  can  perform  bootstrap¬ 
ping  of  a  single  ciphertext  with  the  smallest  amount  of  memory.  In  fact  our  method  will 
support  the  bootstrapping  of  £^^'>  /n  ciphertexts  in  parallel. 

There  will  be  two  secret  keys  for  our  scheme;  depending  on  whether  the  cipher¬ 
texts/plaintexts  are  associated  with  the  ring  R  or  the  ring  S.  We  denote  these  secret 
keys  by  and  which  are  “small”  elements  in  the  ring  R  (resp.  S).  The  mod¬ 
ulus  q  =  qo  —  Po  will  denote  the  smallest  modulus  in  the  set  of  BGV  levels.  Fresh 
ciphertexts  are  defined  for  the  modulus  Q  =  qL  =  W^^^Pi  live  in  the  ring  R^ 
(thus  at  some  point  we  not  only  perform  modulus  switching  but  also  ring  switching). 
We  assume  Li  levels  are  associated  with  the  big  ring  R  and  L2  levels  are  associated 
with  the  small  ring  S,  hence  Li  +  L^  =  L  (level  zero  is  clearly  associated  with  the  small 
ring  S,  but  we  do  not  count  it  in  the  number  of  levels  in  L2).  Thus  we  encrypt  at  level 
L;  perform  standard  homomorphic  operations  down  to  level  zero,  with  a  single  field 
switch  at  level  L2  + 1-  For  ease  of  analysis  we  assume  no  multiplications  are  performed 
at  level  L2  -f  1.  This  means  that  we  can  evaluate  a  depth  L  —  1  circuit. 

A  ciphertext  at  level  i  >  L2,  encrypting  a  message  fx  €  Rp,  is  a  pair  ct  =  (cq,  ci)  G 
Rq.,  where  qi  =  YTj=oPj’  that 

^co  +  •  Cl  (mod  qi)^  (mod  p)  =  p. 

We  let  Encpe(/i)  denote  the  encryption  of  a  message  p  G  Rp,  this  produces  a  ciphertext 
at  level  L.  A  similar  definition  holds  for  ciphertexts  at  level  i  <  L2,  for  messages 
in  Sp  and  secret  keys/ciphertexts  elements  in  Sq. .  When  performing  a  ring  switching 
operation  between  levels  L2  +  1  and  L2,  the  £^^^  plaintext  slots,  associated  with  the 
input  ciphertext  at  level  L2  +  1,  become  associated  with  £^^'>  /£^^'>  distinct  ciphertexts 
at  level  L2 . 
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We  want  to  “bootstrap”  a  set  of  BGV  ciphertexts.  Each  of  these  ciphertexts  is  a  pair 
ctj  =  (cg'^^ ,  Ci'^)  e  Sg,  for  j  =  1, . . . ,  jn,  such  that 

^Cg'^^  +  •  Si'’  (mod  q)^  (mod  p)  =  pj,  for  j  =  1, . . . , 


4  Evaluating  the  Map  red  o  rep  :  ZJ  —  F,  (Simple  Version) 


As  explained  in  the  introduction  at  the  heart  of  most  bootstrapping  procedures  is  a 
method  to  evaluate  the  induced  mapping  red  o  rep  :  — >  Fp.  In  this  section  we 

present  our  simpler  technique  for  doing  this  based  on  polynomials  over  Fp,  in  Section 
7  we  present  a  more  general  (and  complicated  in  terms  of  depth)  technique  based  on 
elliptic  curves.  The  key,  in  this  and  in  all  techniques,  is  to  find  a  representation  G  for 
for  which  the  reduction  modulo  p  map  can  be  evaluated  algebraically  over  Fp.  This 
means  that  the  representation  of  Zg  must  defined  over  Fp.  Prior  work  has  looked  at  the 
bit-representation  (when  p  =  2),  the  p-adic  representation  and  a  matrix  representation; 
we  use  a  polynomial  representation. 

We  select  a  coprime  factorization  q  =  Y[l=i  (with  the  not  necessarily  prime, 
but  pairwise  coprime),  such  that  divides  —  1  for  some  ki.  Since  F*;,^  is  cyclic 
we  know  that  F*^.  has  a  subgroup  of  order  e^.  We  fix  a  polynomial  representation  of 
Fpfci,  i.e.  an  irreducible  polynomial  fi{x)  of  degree  ki  such  that  Fp^^  =  Fp[a;]//i(a:). 
Let  gi  G  denote  a  fixed  element  of  order  in  Fp^^ . 

By  the  Chinese  Remainder  Theorem  we  therefore  have  a  group  embedding 


rep  : 


G  =  nLiF;., 
■(gr,---,ffS) 


(1) 


where  at  =  a  (mod  Cj).  Without  loss  of  generality  we  can  assume  that  the  ki  are 
also  coprime,  by  modifying  the  decomposition  of  q  into  coprime  e^s.  Given  this  group 
representation  of  Z+  in  G,  addition  in  translates  into  multiplication  in  G.  With  one 
addition  in  Z+  translating  into  M  =  ^  YSi=i  '  {ki  +  1)  multiplications  in  Fp  (and  a 
comparable  number  of  additions;  assuming  school  book  multiplication  is  used).  Each 
element  in  the  image  of  rep  requires  E  =  YSi=i  ki  elements  in  Fp  to  represent  it. 

There  will  be  a  map  red  :  G  — >  Fp,  such  that  red  o  rep  is  the  reduction  modulo  p 
map;  and  red  can  be  defined  by  algebraically  from  the  coefficient  representation  of  G 
to  Fp.  Here  algebraically  refers  to  algebraic  operations  over  Fp.  An  arbitrary  algebraic 
expression  on  E  variables  of  degree  d  will  contain  terms.  Thus,  by  interpolating, 

we  expect  the  degree  d  of  the  map  red  to  be  the  smallest  d  such  that  >  q,  which 

means  we  expect  we  expect  d  ^  E  ■  —  1).  Thus  the  larger  E  is,  the  smaller 

d  will  be.  This  interpolating  function  needs  to  be  created  once  and  for  all  for  any  given 
set  of  parameters,  thus  we  ignore  the  cost  in  generating  it  in  our  analysis. 

The  algebraic  circuit  which  implements  the  map  red  can  hence  be  described  as  a 
circuit  of  depth  [log2  d\  which  requires  D{E,  d)  =  ^'^'^Cd  —  (G  +  1)  multiplications 
(corresponding  to  the  number  of  distinct  monomials  in  E  variables  of  degree  between 
two  and  d).  In  particular,  by  approximating  E  «  log2((7)/ log2(p),  we  obtain  that  the 
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circuit  implementing  the  map  red  has  depth  [log2  d\  =  log2(p  —  1)  +  log2(log2((7))  — 

log2(log2(p)))- 

We  pause  to  note  the  following.  By  selecting  a  large  finite  field  it  would  appear  at 
first  glance  that  one  can  reduce  our  degree  d  even  further.  This  however  comes  at  the 
cost  of  having  more  terms,  i.e.  a  larger  value  of  E.  This  in  turn  increases  the  overall 
complexity  of  the  method  (i.e.  the  number  of  multiplications  needed)  but  not  the  depth. 

5  A  Product  of  Powers  of  SIMD  Vectors 

Before  proceeding  with  our  method  to  turn  the  above  methodology  for  reduction  mod¬ 
ulo  p  into  a  bootstrapping  method  for  our  set  of  BGV  ciphertexts,  we  first  examine  how 
to  homomorphically  compute  the  following  function 


A 


k=0 


where  each  v  and  v^,  k  =  0, . . . ,  A,  represents  a  set  of  E  ciphertexts,  each  of  which 
encode  (in  a  SIMD  manner)  elements  in  Fp.  The  multiplication  of  two  such  sets 
of  E  ciphertexts  is  done  with  respect  to  the  multiplication  operation  in  G,  and  thus 
requires  M  homomorphic  multiplications  (this  is  for  our  simple  variation  of  red,  for 
the  variant  based  on  elliptic  curve  the  number  of  ciphertexts  and  the  complexity  of  the 
group  operation  in  G  increase  a  little).  The  values  are  matrices  in  MpR)  y^pR)  (Fp). 
By  the  notation  u  =  v'^,  where  M  =  {rriij),  we  mean  the  vector  with  components 

= n ’  * G 

i=i 

Notice  that  each  Ui  and  Vj  is  a  vector  of  E  elements  in  Fp  representing  a  single  element 
in  G.  In  what  follows  we  divide  this  operation  into  three  sub-procedures  and  compute 
the  number  of  multiplications,  and  the  depth  required,  to  evaluate  the  function. 

5.1  SIMD  Raising  of  an  Encrypted  Vector  to  the  Power  of  a  Public  Vector 

The  first  step  is  to  take  a  vector  v  which  is  the  SIMD  encryption  of  E  sets  of 
elements  in  Fp,  i.e.  it  represents  elements  in  G.  We  then  raise  v  to  the  power  of 
some  public  vector  c  =  (ci, . . . ,  c^(r)  ),  i.e.  we  want  to  compute 

x  =  vG 

In  particular  v  actually  consists  of  E  vectors  each  with  components  in  their  slots. 
We  write 

V  =  (vi,o, . .  .  ,  Vi,fci-1,  . .  .  ,  Vt,o,  •  ■  .  ,  Vt,fe,_i). 

Note,  multiplying  such  a  vector  by  another  vector  of  the  same  form  requires  M  homo¬ 
morphic  multiplications  and  depth  1.  We  first  write 

c  =  Co  +  2  •  Cl  -f  . . .  -f  2I''°S2p1  .  cpog^p], 

where  Ci  G  {0, 1}^''”' .  We  let  c*  denote  the  bitwise  complement  of  c^.  Thus  to  compute 
X  =  we  use  the  following  three  steps: 
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Step  1:  Compute  for  i  =  1, . . . ,  [log2  p\ ,  by  which  we  mean  every  element  in  v 
is  raised  to  the  power  2*.  This  requires  [log2  p]  •  M  homomorphic  multiplications  and 
depth  riog2Pl. 


Step  2:  For  i  G  {0, . . . ,  [log2p]},  j  €  {1, . . . ,  f}  and  k  =  {0, . . . ,  kt  —  1}  compute, 

f  Encp{(ci)  •  kf^O, 

=  ) 

[  Encp{(ci)  •  +  Encpt(c*)  fc  =  0. 

Where  Encpe(ci)  means  encrypt  the  vector  so  that  the  jth  component  of  is  mapped 
to  the  jth  plaintext  slot  of  the  ciphertext.  The  above  procedure  selects  the  values  which 
we  want  to  include  in  the  final  product.  This  involves  a  homomorphic  multiplication 
by  a  constant  in  {0, 1}  and  the  homomorphic  addition  of  a  constant  in  {0, 1}  for  each 
entry,  and  so  is  essentially  fast  (and  moderately  bad  on  the  noise,  so  we  will  ignore  this 
and  call  it  depth  1/2). 


Step  3:  We  now  compute  x  as 

riog2  p1 

x=  n  ww, 

i^O 

where  we  think  of  wb)  as  a  vector  of  E  SIMD  encryptions.  This  step  (assuming  a 
balanced  multiplication  tree)  requires  depth  [log2  [log2 p] ]  and  M  ■  [log2  p]  multipli¬ 
cations. 

Executing  all  three  steps  above  therefore  requires  a  depth  of  1+  [log2  p]  +  [log2  [log2  p~\  ] , 
and  2  ■  M  ■  [log2p]  multiplications. 


5.2  Computing  u  = 

.U  • 

Given  the  previous  subsection,  we  can  now  evaluate  Ui  =  [ i  =  1, . . . ,  >, 

where  v  is  a  SIMD  vector  consisting  of  E  vectors  encoding  elements,  as  is  the 
output  u.  For  this  we  use  a  trick  for  systolic  matrix-vector  multiplication  in  [22],  but 
converted  into  multiplicative  notation. 

We  write  the  matrix  M  as  SIMD  vectors  d^,  for  i  =  1, . . . ,  so  that  dij  = 
(mod  r(«))  for  j  =  1, . . . ,  We  let  v  i  denote  the  SIMD  vector  v 
rotated  left  i  positions  (with  wrap  around).  Since  v  actually  consists  of  E  SIMD  vectors 
this  can  be  performed  using  time  proportional  to  E  multiplications,  but  with  no  addition 
to  the  overall  depth  (it  is  an  expensive  in  terms  of  time,  but  cheap  in  terms  of  noise.  See 
the  operations  in  Table  1  of  [22]). 
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Step  1:  First  compute,  for  i  =  1, . . . , 

Xi  =  (v  (i  -  1))'^’ 

using  the  method  previously  described  in  Subsection  5.1.  This  requires  a  depth  of  ^  + 
[log2  p]  +  [log2  [log2  p] ] ,  and  essentially  ■  {E  +  2  ■  M  ■  |"log2  p] )  multiplications. 


Step  2:  All  we  need  now  do  is  compute 


“  ^  n 

This  requires  (assuming  a  balanced  multiplication  tree)  a  depth  of  [log2  and 
multiplications  in  G. 

Thus  far,  for  the  operations  in  Subsection  5.1  and  this  subsection  we  have  used  a  total 
depth  of  i  +  |'log2  +  [log2  p]  +  [log2  [log2  p]  ]  and  a  cost  of  f •  (M  +  E  +  2- 
M  ■  [log2p])  multiplications. 

5.3  Computing  v  •  HLo 

To  evaluate  our  required  output  we  need  to  execute  the  above  steps  A  times,  in  order  to 
obtain  the  elements  which  we  then  multiply  together.  Thus  in  total  we  have  a  depth  of 

i  +  riog2f(^^i  +  riog2Pi  +  riog2riog2Pii  +  riog2  a] 

and  a  cost  of 

A  -  +  ■  {M  +  E  +  2-  M  ■  riog2Pl)) 

multiplications. 

6  Bootstrapping  a  Set  of  Ciphertexts 

To  perform  our  bootstrapping  operation  we  introduce  another  representation,  this  time 
more  standard.  This  is  the  matrix  representation  of  the  ring  Sq.  Since  Sq  can  be  con¬ 
sidered  a  vector  space  over  by  the  usual  polynomial  embedding,  we  can  associate 
an  element  a  to  its  coefficient  vector  a.  We  can  also  associate  an  element  6  to  a  n  x  n 
matrix  over  such  that  the  vector 

c  =  Mf,  •  a 

is  the  coefficient  vector  of  c  where  c  =  a  ■  b.  This  representation,  which  associates  an 
element  in  Sq  to  a  matrix,  is  called  the  matrix  representation. 

Recall  we  want  to  bootstrap  / n  ciphertexts  in  one  go.  We  also  recall  the  maps 
red  and  rep  from  Section  4  and  define  t  =  red  o  rep  to  be  the  reduction  modulo  p  map 
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on  Z+.  To  do  this  we  can  first  extend  rep  and  r  to  the  whole  of  5”+  by  linearity,  with 
images  in  G"  and  F”  respectively.  Similarly,  we  can  extend  rep  and  r  to  Sg  to 
obtain  maps  rep  :  (5'+)^*^'/”  — >  G^^^'  and  r  :  (5'+)^'”'/”  — s-  as  in  Section 

4.  Again  this  induces  a  map  red,  which  is  just  the  SIMD  evaluation  of  red  on  the  image 
of  rep  in  .  We  let  fep^  ^  denote  the  restriction  of  rep  to  the  (i  —  l)th  coefficient  of 
the  j-th  Sg  component,  for  1  <  f  <  n  and  1  <  j  <  jn. 

We  can  then  rewrite  the  decryption  equation  of  our  6-^'^  In  ciphertexts  as 

(('^0^  +  (mod  g)J  (mod 

=  red  ^fep  . . . 

=  red  (rep(x)) , 


where  x  is  the  vector  consisting  of  Sg  elements  +sB^‘®^  •  for  j  =  1, . . . ,  /n. 

Thus,  if  we  can  compute  f^(x),  then  to  perform  the  bootstrap  we  need  only  evaluate  (in 
^(^^)-fold  SIMD  fashion)  the  arithmetic  circuit  of  multiplicative  depth  [log2  d]  repre¬ 
senting  red.  Since  we  have  enough  slots,  in  the  large  plain  text  ring,  we  are  able  to 
do  this  homomorphically  on  fully  packed  ciphertexts.  The  total  number  of  monomials 
in  the  arithmetic  circuit  (i.e.  the  multiplications  we  would  need  to  evaluate  red)  being 
D(E,d). 


6.1  Homomorphically  Evaluating  rep(x) 


We  wish  to  homomorphically  evaluate  f^(x)  such  that  the  output  is  a  set  of  E  cipher- 
texts  and  if  we  took  the  f  -|-  (j  —  1)  •  £^^'>  /nth  slot  of  each  plaintext  we  would  obtain  the 
E  values  which  represent  fep^  j(x).  Let  A  =  [log  g/  logp] .  We  add  to  the  public  key  of 
the  SHE  scheme  the  encryption  of  fep(p^  •  sB^‘®\  . . .  ,p'"  for  fc  =  0, . . . ,  A  (where 

each  component  is  copied  £^^'>  /n  times).  For  a  given  k  this  is  a  set  of  E  ciphertexts, 
such  that  if  we  took  the  f  -|-  (/  —  1)  •  £^^'>  /nth  slot  of  each  plaintext  we  would  obtain 
the  E  values  which  represent  rep^  Let  the  resulting  vector  of  ciphertexts 

be  denoted  ct^,  for  fc  =  1, . . . ,  A,  where  ct^  is  a  vector  of  length  E. 

Let  M  (j)  be  the  matrix  representation  of  the  second  ciphertext  component  of 
the  j-th  ciphertext  that  we  want  to  bootstrap.  We  write 

=  Ep"  • 

fc=0 
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where  is  a  matrix  with  coefficients  in  {0, . . .  —  1}.  We  then  have  that 

+  E  (/  • 

k^O 

where  is  the  vector  of  coefficients  of  the  secret  key 

We  let  =  diag(M^^’^\  . . . ,  -y^e  ^ow  apply 

rep  to  both  sides,  which  means  we  need  to  compute  homomorphic  ally  the  ciphertext 
which  represents 

A  ivr<'“> 

k=0 

We  are  thus  in  the  situation  described  in  Section  5.  Thus  the  homomorphic  evaluation 
of  f^(x)  requires  a  depth  of 

i  +  [logaPl  +  riogaTlogaPll  +  [loga  A] 


and 

A  -  +  ■{M  +  E  +  2-M  ■  riog2Pl)) 

multiplications. 


6.2  Repacking 

At  this  point  in  the  bootstrapping  procedure  (assuming  for  simplicity  that  a  ring  switch 
has  not  occured)  we  have  a  single  ciphertext  ct  whose  slots  encode  the  coefficients 
(over  the  small  ring)  of  the  /n  ciphertexts  that  we  are  bootstrapping.  Our  task  is 
now  to  extract  these  coefficients  to  produce  a  ciphertext  (or  set  of  ciphertexts)  which 
encode  the  same  data.  Effectively  this  is  the  task  of  performing  /n  inverse  Fourier 
transforms  (a.k.a  interpolations)  over  S  in  parallel,  and  then  encoding  the  result  as  ele¬ 
ments  in  R  via  the  embedding  l  :  S  — >  R. 

There  are  a  multitude  of  ways  of  doing  this  step  (bar  performing  directly  an  in¬ 
verse  EFT  algorithm),  for  example  the  general  method  of  Alperin-Sheriff  and  Peikert 
[1]  could  be  applied.  This  makes  the  observation  that  the  EFT  to  a  vector  of  Fourier 
coefficients  x  is  essentially  applying  a  linear  operation,  and  hence  we  can  compute  it 
by  taking  the  trace  of  a  value  a  ■  x  for  some  fixed  constant  a. 

We  select  a  more  naive,  and  simplistic  approach.  Suppose  x  is  the  vector  which  is 
encoded  by  the  input  ciphertext.  We  first  homomorphically  compute 

bi, . . . ,  h^R)  =  replicate(x). 
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Where  replicate(x)  is  the  Full  Replication  algorithm  from  [22].  This  produces 
ciphertexts,  the  ith  of  which  encodes  the  constant  polynomial  over  Rp  equal  to  the  i 
slot  in  X.  In  [22]  this  is  explained  for  the  case  where  =  N,  but  the  method  clearly 
works  whenf^^^  <  N.  The  method  requires  time  and  depth  0(loglogf*^^^). 

Given  the  output  bi , . . . ,  h^R) ,  which  encode  the  coefficients  of  the  / n  original 

plaintext  vectors,  we  can  now  apply  i  (which  recall  is  a  linear  map)  to  obtain  any  linear 
function  of  the  underlying  plaintexts.  For  example  we  could  produce  jn  ciphertexts 
each  of  which  encodes  one  of  the  original  plaintexts,  or  indeed  a  single  ciphertext  which 
encodes  all  of  them. 

So  putting  all  of  the  sub-procedures  for  bootstrapping  together,  we  find  that  we  can 
bootstrap  jn  ciphertexts  in  parallel  using  a  procedure  of  depth  of 

[log2  +  ^  +  riog2  -f  riog2  p\  +  riog2  riog2  p\  1  -f  riog2  A]  -f  0(log2  log2 

and  a  cost  of 

D{E,d)  +  \-  +  ■{M  +  E  +  2-M  ■  riog2Pl))  + 

multiplications,  where  d  «  (log2  (?)  •  (p  —  l)/(log2p),  E  =  ^  \  ' 

ELi  ki  ■  {h  +  1). 

7  Elliptic  Curves  Based  Variant 

We  now  extend  our  algorithm  from  representations  in  finite  fields  to  representations  in 
elliptic  curve  groups.  Recall  we  need  to  embed  Z+  into  a  group  defined  over  Fp  whose 
operations  can  be  expressed  in  terms  of  the  functionality  of  the  homomorphic  encryp¬ 
tion  scheme.  This  means  that  the  range  of  the  representation  should  be  an  algebraic 
group.  We  have  already  seen  linear  algebraic  groups  (a.k.a.  matrix  representations)  used 
in  this  context  in  work  of  Alperin-Sherriff  and  Peikert,  thus  as  it  is  natural  (to  anyone 
who  has  studied  algebraic  groups)  to  consider  algebraic  varieties.  The  finite  field  case 
discussed  in  the  previous  sections  corresponds  to  the  genus  zero  case,  thus  the  next 
natural  extension  would  be  to  examine  the  genus  one  case  (a.k.a.  elliptic  curves). 

The  reason  for  doing  this  is  the  value  of  q  from  Table  2  compared  to  the  estimated 
values  from  Table  1  are  far  from  optimal.  This  is  because  we  have  few  possible  group 
orders  of  F*;,^ .  The  standard  trick  in  this  context  (used  for  example  in  the  ECM  fac¬ 
torization  method,  the  ECPP  primality  prover,  or  even  indeed  in  all  of  elliptic  curve 
cryptography)  is  to  replace  the  multiplicative  group  of  a  finite  field  by  an  elliptic  curve 
group. 

Just  as  before  we  select  a  coprime  factorization  q  =  (with  the  not  nec¬ 

essarily  prime,  but  pairwise  coprime).  But  now  we  require  that  divides  the  order  of 
an  elliptic  curve  Ei  defined  over  .  Since  the  group  orders  of  elliptic  curves  are  dis¬ 
tributed  roughly  uniformly  within  the  Basse  interval  it  is  highly  likely  that  there  are 
such  elliptic  curves.  Determining  such  curves  may  however  be  a  hard  problem  for  a 
fixed  value  of  <?;  a  problem  which  arose  previously  in  cryptography  in  [3].  However, 
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since  we  have  some  freedom  in  selecting  q  in  our  scheme  we  can  select  q  and  the  Ei 
simultaneously,  and  hence  finding  the  elliptic  curves  will  not  be  a  problem. 

Again,  we  fix  a  polynomial  representation  of  ,  i.e.  an  irreducible  polynomial 
fi{x)  of  degree  ki  such  that  =  ¥p[x]/ fi{x),  and  now  we  let  Gi  G  Ei(Fpki) 
denote  a  fixed  point  on  the  elliptic  curve  of  order  .  We  now  can  translate  our  method 
into  this  new  setting.  For  example  Equation  (1)  translates  to 


rep  : 


rz+-^G  =  nLi^^*(V.) 

\  a  I — >  {[ai]Gi, . . .  ,[at]Gt) 


(2) 


where  Ui  =  a  (mod  e^). 

Homomorphic  calculations  in  G  are  then  performed  using  Jacobian  Projective  co¬ 
ordinates.  This  means  that  general  point  addition  can  be  performed  with  multiplicative 
depth  five  and  M'  =  16- M  homomorphic  multiplications.  Our  method  then  proceeds  as 
before,  except  we  replace  homomorphic  multiplication  in  F*^ .  with  Jacobian  projective 
point  addition  in  Ei{¥pki ). 

The  computation  of  red  is  then  performed  as  follows.  We  first  homomorphic  ally 
map  the  projective  points  in  G  into  an  affine  point.  Each  such  conversion,  in  component 
i,  requires  an  Fp^^ -field  inversion  and  three  F^fc; -field  multiplications.  If  we  let  Dlnv^ 
(resp.  MInvi)  denote  the  depth  (resp.  number  of  multiplications  in  Fp)  of  the  circuit  to 
invert  in  the  field  Fpfc; .  This  implies  that  the  conversion  of  a  set  of  projective  points  in 
G  to  a  set  of  affine  points  requires  depth  3  +  Dlnv^  and  4-  M  +  J2l=iM\nv, 

homomorphic  multiplications  over  Fp. 

Given  this  final  conversion  to  affine  form,  we  have  effectively  E'  =  E  +  t,  as 
opposed  to  E,  variables  defining  the  elements  in  G.  The  extra  t  variables  coming  from 
the  y-coordinate;  it  is  clear  we  only  need  to  store  t  such  variables  as  opposed  to  E  such 
variables  as  each  x  coordinate  corresponds  to  at  most  two  y-coordinates  and  hence  a 
naive  form  of  homomorphic  point  compression  can  be  applied. 

This  means  the  map  red  (after  the  conversion  to  affine  coordinates  so  as  to  reduce 
the  multiplicative  complexity  of  the  interpolated  polynomial)  can  be  expressed  as  a 
degree  d'  map;  where  we  expect  d'  to  be  the  smallest  d'  such  that  ®  Gd'  >  q,  which 
means  we  expect  d'  ~  E'  ■  iog(£^')  _  i).  This  means,  as  before,  that  the  resulting 

depth  will  be  [log2  d']  and  the  number  of  multiplications  will  be  D{E' ,  d'). 

So  putting  all  of  the  sub-procedures  for  bootstrapping  together,  we  find  that  we  can 
use  the  elliptic  curve  variant  of  our  bootstrapping  method  to  bootstrap  /n  cipher- 
texts  in  parallel  using  a  procedure  of  depth  of 

(loga  d'l  +  5  •  Q  +  [logs  +  •  [logs p\  +  ■  [logs  [loga p\  1  +  llog^  A] ^ 

-1-3-1-  max  DInvj  -I-  0(log2  log2 

i—1 

and 

D{E',  d')  +  X-  [m'  +  -{M'  +  ?,-E  +  2-M'  ■  riog2Pl )) 

t 

+  4-  M  +  Emi  nvi  + 

i=l 
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multiplications,  where  d'  «  log  q/ log  S',  E'  =  +  1),  M  =  X]i=i  '  (^i  + 

l)/2  and  M'  =  16  •  M.  Note  the  3  •  E  term  comes  from  needing  to  rotate  the  three 
projective  coordinates. 

However,  the  ability  to  use  arbitrary  q  comes  at  a  penalty;  the  depth  required  has 
dramatically  increased  due  to  the  elliptic  curve  group  operations.  For  example  if  we 
consider  a  prime  p  of  size  roughly  2^®  and  k  =  2,  then  we  need  about  200  levels,  as 
opposed  to  56  with  the  finite  field  variant.  This  then  strongly  influences  the  required 
value  of  N,  pushing  it  up  from  around  85, 000  to  220, 000.  Thus  in  practice  the  elliptic 
curve  variant  is  unlikely  to  be  viable. 
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A  Parameter  Calculation 

In  [20]  a  concrete  set  of  parameters  for  the  BGV  SHE  scheme  was  given  for  the  case  of 
binary  message  spaces,  and  arbitrary  L.  In  [12]  this  was  adapted  to  the  case  of  message 
space  Rp  for  2-power  cyclotomic  rings,  but  only  for  the  schemes  which  could  support 
one  level  of  multiplication  gates  (i.e.  for  L  =  1).  In  [11]  these  two  approaches  were 
combined,  for  arbitrary  L  and  p,  and  the  analysis  was  (slightly)  modified  to  remove 
the  need  for  a  modulus  switching  upon  encryption.  In  this  section  we  modify  again  the 
analysis  of  [1 1]  to  present  an  analysis  which  includes  a  step  of  field  switching  from  [17]. 
We  assume  in  this  section  that  the  reader  is  familiar  with  the  analysis  and  algorithms 
from  [20,11,17]. 

Our  analysis  will  make  extensive  use  of  the  following  fact:  If  a  €  i?  be  chosen  from 
a  distribution  such  that  the  coefficients  are  distributed  with  mean  zero  and  standard 
deviation  a,  then  if  (m  is  a  primitive  mth  root  of  unity,  we  can  use  6  •  cr  to  bound  a((m) 
and  hence  the  canonical  embedding  norm  of  a.  If  we  have  two  elements  with  variances 
af  and  (t|,  then  we  can  bound  the  canonical  norm  of  their  product  with  16  •  Ci  •  (J2. 


Ensuring  We  Can  Evaluate  the  Required  Depth:  Recall  we  have  two  rings  R  and 
S  of  degree  N  and  n  respectively.  The  ring  S'  is  a  subring  of  R  and  hence  n  divides 
N.  We  require  a  chain  of  moduli  go  <  7i  •  •  ■  <  <1l  corresponding  to  each  level  of 
the  scheme.  We  assume  (for  sake  of  simplicity)  that  qijqi-x  =  pi  are  primes.  Thus 
Ql  =  Qo  ■  n:;  Pi.  Also  note,  that  as  in  [11],  we  apply  a  SHE.LowerLevel  (a.k.a. 
modulus  switch)  algorithm  before  a  multiplication  operation.  This  often  leads  to  lower 
noise  values  in  practice  (which  a  practical  instantiation  can  make  use  of).  In  addition  it 
eliminates  the  need  to  perform  a  modulus  switch  after  encryption,  which  happened  in 
[20]. 

We  utilize  the  following  constants  described  in  [12],  which  are  worked  out  for  the 
case  of  message  space  defined  modulo  p  (the  constants  in  [12]  make  use  of  an  additional 
parameter,  arising  from  the  key  generation  procedure.  In  our  case  we  can  take  this 
constant  equal  to  one).  In  the  following  h  is  the  Hamming  weight  of  the  secret  keys 
andsf^^. 

.Bciean  =N  ■  pl2  +  p  •  (T  •  - ^  ® 
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=p.a-N  ■  (  1.49  •  Vh-N  +  2.11- h  +  5.54  •  Vh  +  IMVn  +  4.62 


=P  ■  •  (  1  +  °  • 

3^^^^  -V  ■ 

’Ks  —P 

bI^J  =P  ■  a  ■  n  ■  (^1A9  ■  Vh^  +  2.11  •  h  +  5.54  •  +  1.96v^  +  4.62^ 

As  in  [20]  we  define  a  small  “wiggle  room”  ^  which  we  set  to  be  equal  to  eight;  this 
is  set  to  enable  a  number  of  additions  to  be  performed  without  needing  to  individually 
account  for  them  in  our  analysis.  These  constants  arise  in  the  following  way: 

-  A  freshly  encrypted  ciphertext  at  level  L  has  noise  bounded  by  Bciean- 

-  In  the  worst  case,  when  applying  SHE.LowerLevel  to  a  (big  ring)  ciphertext  at  level 
I  >  L2  +  1  with  noise  bounded  by  B'  one  obtains  a  new  ciphertext  at  level  I  —  1 
with  noise  bounded  by 

—  + 

Pi 

-  In  the  worst  case,  when  applying  SHE.LowerLevel  to  a  (small  ring)  ciphertext  at 
level  I  <  T2  +  1  with  noise  bounded  by  B'  one  obtains  a  new  ciphertext  at  level 
[  —  1  with  noise  bounded  by 

—  +  B^^^ 

Pi 

-  When  applying  the  tensor  product  multiplication  operation  to  (big  ring)  ciphertexts 
of  a  given  level  [  >  L2  +  1  of  noise  Bi  and  B2  one  obtains  a  new  ciphertext  with 
noise  given  by 


Bi  ■  B2 


d(^)  „ 

Pr 


B. 


(R) 

Scale  ’ 


where  Pp  is  a  value  to  be  determined  later. 

-  When  applying  the  tensor  product  multiplication  operation  to  (small  ring)  cipher- 
texts  of  a  given  level  1  <  L2  of  noise  Bi  and  B2  one  obtains  a  new  ciphertext  with 
noise  given  by 


Bi  ■  B2 


-9. 


B, 


(S) 


Pg  '  Scale’ 

where  again  Pg  is  a  value  to  be  determined  later. 

A  general  evaluation  procedure  begins  with  a  freshly  encrypted  ciphertext  at  level 
L  with  noise  Pciean  ■  When  entering  the  first  multiplication  operation  we  first  apply  a 
SHE.LowerLevel  operation  to  reduce  the  noise  to  a  universal  bounds.  B^^\  whose  value 
will  be  determined  later.  We  therefore  require 


i-Bc 


PL 


PL  > 


.^Clean 


(3) 
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We  now  turn  to  dealing  with  the  SHE.LowerLevel  operations  which  occurs  before  a 
multiplication  gate  at  level  IS  —  1}\{L2  +  1}-Iii  what  follows  we  assume 

I  >  L2  +  1,  to  obtain  the  equations  for  [  <  L2  one  simply  replaces  the  i?-constants 
by  their  equivalent  5'-constants.  We  perform  a  worst  case  analysis  and  assume  that 
the  input  ciphertexts  are  at  level  L  We  can  then  assume  that  the  input  to  the  tensoring 
operation  in  the  previous  multiplication  gate  (just  after  the  previous  SHE.LowerLevel ) 
was  bounded  by  and  so  the  output  noise  from  the  previous  multiplication  gate  for 
each  input  ciphertext  is  bounded  by  ■  qi/Pr  +  This  means  the 

noise  on  entering  the  SHE.LowerLevel  operation  is  bounded  by  ^  times  this  value,  and 
so  to  maintain  our  invariant  we  require 

+  -91  ,  .(i?)  .  .fit) 

+  Pn-Pi  +^Sca,e<S  ■ 

Rearranging  this  into  a  quadratic  equation  in  B^^^  we  have 


.  (s(«))2 

Pi 


J^iR)  I  ’ _ ^Scale 

\  Pi 


(R) 

Ks 


qi-1 


Pr 


<  0. 


We  denote  the  constant  term  in  this  equation  by  We  now  assume  that  all  primes 
P[  are  of  roughly  the  same  size  (for  the  ring  R),  and  noting  the  we  need  to  only  satisfy 
the  inequality  for  the  largest  modulus  [  =  L  —  1  (resp.  [  =  L2  for  the  ring  S).  We  now 
fix  Rl-2  by  trying  to  ensure  that  Rl-2  is  close  to  Sscaie  '  +  Upl-i)  ~  ^Scaie’ 

we  set  Rl-2  =  (1  -  2“^)  •  i^sfaie  '  (1  +  i/PL-i),  and  obtain 


Pr 


i-B 


(R) 

Ks 


qL-2 


B 


(R) 

Scale 


(4) 


since  SsSL  '  (1  +  C/Pi-i)  ~  Similarly  for  the  small  ring  we  find 


Ps 


i-B 


(S) 

Ks 


9Z,2-1 


B 


(S) 

Scale 


(5) 


To  ensure  we  have  a  solution  we  require  1  —  4  •  ^  •  Rl-2/pl-i  P  0^  (resp.  1  —  4  •  ^  • 
RL2-1/PL2  P  0)  which  implies  we  should  take,  for  i  =  2, . . . ,  L  —  1, 


4  •  ^  Rl-2  ~  32  •  bI^I^  =  pr  For  i  =  L2  +  2, . . . ,  L  -  1, 

4  •  ^  Rl2-i  «  32  •  =PS  For  i  =  1, . . . ,  L2. 


We  now  examine  what  happens  at  level  L2+ 1  when  we  perform  a  ring  switch  operation. 
Following  Lemma  3.2  of  [17]  we  know  the  noise  increases  by  a  factor  of  (p/2)  •  yjN/n. 
The  noise  output  from  the  previous  multiplication  gate  is  bounded  by  ' 
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qL2+2/PR  +  ^sfaie-  Note  that 

d(^)  ^  .  n  . 

-^Ks  '  ^1/2+2  ^  ^Ks  '  ^^2+2  '  -^Scale 

^  8  •  ^  •  gi_2 


o  /•  Li  — 4 
8 • C • Pr 

Thus  the  we  know  that  the  noise  after  the  ring  switch  operation  is  bounded  by 

SRingSwitch  =  f  •  g  . 


We  now  modulus  switch  down  to  level  L2,  and  obtain  a  ciphertext  (over  the  ring 
with  noise  bounded  by 


-^RingSwitch 
PL2  +  I 


+  B 


(S) 

Scale  ’ 


S) 


We  would  like  this  to  be  less  than  the  universal  bound  which  implies 


PL2+1  > 


B 


RingSwitch 


B(S)  -  B 


(S) 

Scale 


(7) 


We  now  need  to  estimate  the  size  of  po-  Due  to  the  above  choices  the  ciphertext  to  which 
we  apply  the  bootstrapping  has  norm  bound  by  B^^'> .  This  means  that  we  require 

qo  =  Po  >  ■  B^^P  Cm' ,  (8) 


to  ensure  a  valid  decryption/bootstrapping  procedure.  Recall  c^'  is  the  ring  constant  for 
the  polynomial  ring  S  and  it  depends  only  on  m!  (see  [13]  for  details). 


Ensuring  We  Have  Security:  The  works  before  [31,23],  such  as  Lindner  and  Peikert 
[24],  did  not  include  the  rank  of  the  lattice  into  account  when  estimating  the  cost  of  the 
attacker.  The  reason  is  that  the  lattice  rank  appears  to  be  only  a  second  order  term  in 
the  cost  of  the  attack.  However,  for  applications  such  as  FHE,  the  dimension  is  usually 
very  big,  e.g.  2^®,  and  lattice  algorithms  are  often  polynomial  in  the  rank.  Therefore, 
even  as  a  second  order  term  it  can  contribute  significantly  to  the  cost  of  the  attack. 
The  largest  modulus  used  in  our  big  ring  (resp.  small  ring)  key  switching  matrices,  i.e. 
the  largest  modulus  used  in  an  LWE  instance,  is  given  by  Ql-i  =  Pr  •  9l-i  (resp. 
Ql2  =  Ps  ■  ^La)- 

We  recall  the  approach  of  [31,23]  here.  Eirst,  fix  some  security  level  as  measured 
in  enumeration  nodes,  e.g.  2^^®.  Now,  use  estimates  by  Chen  and  Nguyen  [9]  are  used 
to  determine  the  cost  of  running  BKZ  2.0  for  various  block  sizes  p.  Combining  this 
with  the  security  level  gives  an  upper  bound  on  the  rounds  an  attacker  can  perform, 
depending  on  /3.  Then,  for  various  lattice  dimensions  r,  the  BKZ  2.0  simulator  by  Chen 
and  Nguyen  is  used  to  determine  the  quality  of  the  vector  as  measured  by  the  root- 
Hermite  factor  6{P,r)  =  (||b||/vol(L)^/’’)^/’’.  Now,  the  best  possible  root-Hermite 
factor  achievable  by  the  attacker  is  given  by  6{r)  =  min^  6{P,  r) 
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In  LWE,  the  relevant  parameters  for  the  security  are  the  ring  dimension  n  (resp.  N), 
the  modulus  Q  =  (resp.  Q  =  Ql-i)  and  the  standard  deviation  a.  Note  that  in 
most  scenarios,  an  adversary  can  choose  how  many  LWE  samples  he  uses  in  his  attack. 
This  number  r  is  equal  to  the  rank  of  the  lattice.  The  distinguishing  attack  against  LWE 
uses  a  short  vector  in  the  dual  SIS  lattice  to  distinguish  the  LWE  distribution  from  the 
uniform  distribution.  More  precisely,  an  adversary  can  distinguish  between  these  two 
distributions  with  distinguishing  advantage  e  if  the  shortest  vector  he  can  obtain  (in 
terms  of  its  root-Hermite  factor)  satisfies 

5{rY  ■  •  a  <  y/-log{e)/Tr. 


It  follows  that  in  order  for  our  system  to  be  secure  against  the  previously  described 
adversary,  we  need  that 


log2(g)  <  min 

r>n 


■  log2(()(r))  +  r  ■  log2(g/a) 
r  —  n 


(9) 


where  a  =  ^  —  log(£) /tt.  See  also[27,24,23]  for  more  information.  Eor  every  n  we  can 
now  compute  an  upper  bound  on  log2  (<?)  by  iterating  the  right  hand  side  of  Equation  (9) 
over  m  and  selecting  the  minimum. 


Putting  it  all  together  As  in  [20,12],  we  set  a  =  3.2,  =  2  •  SsSL  = 

2  •  i?scaie'  From  our  equations  (3),  (4),  (5),  (6),  (7),  and  (8)  we  obtain  equations  for  pi 
for  i  =  0, . . . ,  L,  Pn  and  Ps  in  terms  of  n,  N,  L,  h  and  the  security  level  k. 


B  Example  Parameters 

In  Appendix  A  we  present  a  calculation  of  suitable  parameters  for  our  scheme,  and 
the  resulting  complexity  of  the  polynomial  representation  of  red,  here  we  work  out  a 
concrete  set  of  parameters  for  various  plaintext  moduli  p. 

We  target  n  =  128-bits  of  security,  and  set  the  Hamming  weight  h  of  the  secret  key 
to  be  64  as  in  [20,12].  On  input  N  and  n  the  to  the  formulae  in  Appendix  A  we  obtain 
an  upper  bounds  on  log(gi,_i)  and  log(gL2)-  We  now  use  equations  (3)-(8)  from  the 
Appendix  for  different  values  of  the  plaintext  modulus  p  to  obtain  a  lower  bound  on 
log(gL_i)  and  \og{QL2)-  Then,  we  increase  N  and  n  until  the  lower  bound  on  Ql-i 
and  from  the  functionality  is  below  the  upper  bound  from  the  security  analysis.  In 
this  way  we  obtain  lower  bounds  for  N  and  n. 

In  Table  1  we  consider  four  different  values  of  p;  for  simplicity  we  also  set  f  =  1  in 
(1),  i.e.  G  =  F*;,,  for  a  suitable  choice  of  k.  After  finding  approximate  values  for  N,  n 
and  q  we  can  then  search  for  exact  values  of  N,  n  and  q.  More  precisely,  we  are  looking 
for  cyclotomic  rings  R  and  S  such  that  the  degree  N  =  (j>{m)  of  F{X)  =  and 

n  =  of  f{x)  =  are  larger  than  the  bounds  above  and  n  divides  both  N 

and  (the  number  of  plaintext  slots  associated  with  R).  In  addition  we  require  that 
q  divides  —  1.  See  Table  2  for  some  values. 

Notice  that  the  value  of  q  is  strongly  influenced  by  the  ring  constant  Cm'  ■  In  Table  1 
we  set  Cm'  =  1-28  (i.e.  we  assume  the  best  case  of  m'  being  prime),  whereas  in  Table  2 
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Table  1.  Lower  bounds  on  N  and  n 


p 

n 

c  =  jn 

n 

N  K. 

q  « 

2 

128 

1 

860 

23100 

11637 

2 

24100 

128 

1 

1040 

51800 

1635087 

2 

53100 

3,4, 

56000 

[5, . . . ,  10] 

57600 

128 

1 

1300 

96000 

467989106 

2,3 

98500 

[4, . . . ,  10] 

103000 

«  2^^ 

128 

1 

1750 

181000 

3.558467651  •  10^^ 

2 

183000 

[3, . . . ,  10] 

185000 

Table  2.  A  concrete  set  of  cyclotomic  rings  with  an  estimation  of  the  number  of  multiplications 
and  the  depth  required  to  perform  our  bootstrapping  step 


P 

m 

N  =  (j){m) 

m' 

II 

Cm' 

k 

L 

#  Mults 

<7 

2 

31775 

24000 

1271 

1200 

3.93 

1 

l6 

fs  8.3  ■  10“ 

65535 

32767 

27000 

1057 

900 

2.69 

2 

15 

23 

«  1.02  ■  10’’ 

32767 

2**  +  1 

62419 

51840 

1687 

1440 

2.72 

1 

ir 

40 

fs  4.6-  10“ 

4243648 

91149 

58080 

1321 

1320 

1.28 

1 

3 

39 

«  2.3-  10“ 

2121824 

137384 

63360 

1321 

1320 

1.28 

4 

3 

41 

fs  3.5  ■  10“ 

2121824 

2"'’  + 1 

113993 

100800 

2651 

2400 

2.9 

1 

T" 

fs  1.5  ■  10“ 

2147549184 

160977 

102608 

2333 

2332 

1.28 

2 

2 

58 

fs  6.3  ■  10“ 

715849728 

272200 

108800 

1361 

1360 

1.28 

4 

2 

57 

fs  4.8-  10“ 

536887296 

2^'^  +  15 

198203 

183040 

2227 

2080 

3.6 

1 

79 

1.1  ■  10“"* 

414161297767368 

202051 

199872 

2083 

2082 

1.28 

4 

2 

79 

«  3.9  ■  10’“ 

50637664608480 

352317 

190512 

2649 

1764 

1.81 

6 

2 

82 

«  5.1  ■  10’“ 

50637664608480 

we  compute  the  actual  value  of  the  ring  constant  for  each  cyclotomic  ring  we  consider. 
For  example  for  p  =  2,  in  Table  1  we  obtain  an  approximate  value  q  «  11637,  but  in 
Table  2  we  need  a  larger  value  due  to  the  additional  condition  that  q  divides  —  1,  and 
the  ring  constant,  which  is  bigger  than  1.27  for  m!  =  1271  and  m'  =  1057. 


221 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


Multiparty  Computation  from  Somewhat  Homomorphic 

Encryption 


Ivan  Damgard^,  Valerio  Pastro^,  Nigel  Smart^,  and  Sarah  Zakarias^ 

^  Department  of  Computer  Science,  Aarhus  University 
^  Department  of  Computer  Science,  Bristol  University 


Abstract.  We  propose  a  general  multiparty  computation  protocol  secure  against  an  active  adversary 
corrupting  up  to  n— 1  of  the  n  players.  The  protocol  may  be  used  to  compute  securely  arithmetic  circuits 
over  any  finite  field  .  Our  protocol  consists  of  a  preprocessing  phase  that  is  both  independent  of  the 
function  to  be  computed  and  of  the  inputs,  and  a  much  more  efficient  online  phase  where  the  actual 
computation  takes  place.  The  online  phase  is  unconditionally  secure  and  has  total  computational  (and 
communication)  complexity  linear  in  n,  the  number  of  players,  where  earlier  work  was  quadratic  in  n. 
Moreover,  the  work  done  by  each  player  is  only  a  small  constant  factor  larger  than  what  one  would 
need  to  compute  the  circuit  in  the  clear.  We  show  this  is  optimal  for  computation  in  large  helds.  fn 
practice,  for  3  players,  a  secure  64-bit  multiplication  can  be  done  in  0.05  ms.  Our  preprocessing  is  based 
on  a  somewhat  homomorphic  cryptosystem.  We  extend  a  scheme  by  Brakerski  et  ah,  so  that  we  can 
perform  distributed  decryption  and  handle  many  values  in  parallel  in  one  ciphertext.  The  computational 
complexity  of  our  preprocessing  phase  is  dominated  by  the  public-key  operations,  we  need  0(n^/s) 
operations  per  secure  multiplication  where  s  is  a  parameter  that  increases  with  the  security  parameter 
of  the  cryptosystem.  Earlier  work  in  this  model  needed  operations.  In  practice,  the  preprocessing 

prepares  a  secure  64-bit  multiplication  for  3  players  in  about  13  ms. 


1  Introduction 

A  central  problem  in  theoretical  cryptography  is  that  of  secure  multiparty  computation  (MPC). 
In  this  problem  n  parties,  holding  private  inputs  xi, . . . ,  Xn,  wish  to  compute  a  given  function 
f{xi, . . . ,  Xn)-  A  protocol  for  doing  this  securely  should  be  such  that  honest  players  get  the  correct 
result  and  this  result  is  the  only  new  information  released,  even  if  some  subset  of  the  players  is 
controlled  by  an  adversary. 

In  the  case  of  dishonest  majority^  where  more  than  half  the  players  are  corrupt,  unconditionally 
secure  protocols  cannot  exist.  Under  computational  assumptions,  it  was  shown  in  [8]  how  to  con¬ 
struct  UC-secure  MPC  protocols  that  handle  the  case  where  all  but  one  of  the  parties  are  actively 
corrupted.  The  public- key  machinery  one  needs  for  this  is  typically  expensive  so  efficient  solutions 
are  hard  to  design  for  dishonest  majority.  Recently,  however,  a  new  approach  has  been  proposed 
making  such  protocols  more  practical.  This  approach  works  as  follows:  one  first  designs  a  general 
MPC  protocol  in  the  preprocessing  model,  where  access  to  a  “trusted  dealer”  is  assumed.  The  dealer 
does  not  need  to  know  the  function  to  be  computed,  nor  the  inputs,  he  just  supplies  raw  material 
for  the  computation  before  it  starts.  This  allows  the  “online”  protocol  to  use  only  cheap  information 
theoretic  primitives  and  hence  be  efficient.  Finally,  one  implements  the  trusted  dealer  by  a  secure 
protocol  using  public-key  techniques,  this  protocol  can  then  be  run  in  a  preprocessing  phase.  The 
current  state  of  the  art  in  this  respect  are  the  protocols  in  Bendlin  et  ah,  Damgard/Orlandi  and 
Nielsen  et  al.  [5, 13,  25].  The  “MPC-in-the-head”  technique  of  Ishai  et  al.  [18, 17]  has  similar  overall 
asymptotic  complexity,  but  larger  constants  and  a  less  efficient  online  phase. 

Recently,  another  approach  has  become  possible  with  the  advent  of  Fully  Homomorphic  En¬ 
cryption  (FHE)  by  Gentry  [15].  In  this  approach  all  parties  first  encrypt  their  input  under  the 
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FHE  scheme;  then  they  evaluate  the  desired  function  on  the  ciphertexts  using  the  homomorphic 
properties,  and  finally  they  perform  a  distributed  decryption  on  the  final  ciphertexts  to  get  the 
results.  The  advantage  of  the  FHE-based  approach  is  that  interaction  is  only  needed  to  supply 
inputs  and  get  output.  However,  the  low  bandwidth  consumption  comes  at  a  price;  current  EHE 
schemes  are  very  slow  and  can  only  evaluate  small  circuits,  i.e.,  they  actually  only  provide  what 
is  known  as  somewhat  homomorphic  encryption  (SHE).  This  can  be  circumvented  in  two  ways; 
either  by  assuming  circular  security  and  implementing  an  expensive  bootstrapping  operation,  or 
by  extending  the  parameter  sizes  to  enable  a  “levelled  FHE”  scheme  which  can  evaluate  circuits  of 
large  degree  (exponential  in  the  number  of  levels)  [6].  The  main  cost,  much  like  other  approaches,  is 
in  terms  of  the  number  of  multiplications  in  the  arithmetic  circuit.  So  whilst  theoretically  appealing 
the  approach  via  FHE  is  not  competitive  in  practice  with  the  traditional  MFC  approach. 


1.1  Contributions  of  this  paper. 

Optimal  Online  Phase.  We  propose  an  MFC  protocol  in  the  preprocessing  model  that  computes 
securely  an  arithmetic  circuit  C  over  any  finite  field  Fp*.  The  protocol  is  statistically  UC-secure 
against  active  and  adaptive  corruption  of  up  to  n  —  1  of  the  n  players,  and  we  assume  synchronous 
communication  and  secure  point-to-point  channels.  Measured  in  elementary  operations  in  F^fe  the 
total  amount  of  work  done  is  0{n-  ICI  -|-n^)  where  ICI  is  the  size  of  C.  All  earlier  work  in  this  model 
had  complexity  I7(n^  •  IC*!).  A  similar  improvement  applies  to  the  communication  complexity  and 
the  amount  of  data  one  needs  to  store  from  the  preprocessing.  Hence,  the  work  done  by  each  player 
in  the  online  phase  is  essentially  independent  of  n.  Moreover,  it  is  only  a  small  constant  factor 
larger  than  what  one  would  need  to  compute  the  circuit  in  the  clear.  This  is  the  first  protocol  in 
the  preprocessing  model  with  these  properties^. 

Finally,  we  show  a  lower  bound  implying  that  w.r.t  the  amount  of  data  required  from  the 
preprocessing,  our  protocol  is  optimal  up  to  a  constant  factor.  We  also  obtain  a  similar  lower 
bound  on  the  number  of  bit  operations  required,  and  hence  the  computational  work  done  in  our 
protocol  is  optimal  up  to  poly-logarithmic  factors. 

All  results  mentioned  here  hold  for  the  case  of  large  fields,  i.e.,  where  the  desired  error  probability 
is  {1/p^Y,  for  a  small  constant  c.  Note  that  many  applications  of  MFC  need  integer  arithmetic, 
modular  reductions,  conversion  to  binary,  etc.,  which  we  can  emulate  by  computing  in  Fp  with  p 
large  enough  to  avoid  overflow.  This  naturally  leads  to  computing  with  large  fields.  As  mentioned, 
our  protocol  works  for  all  fields,  but  like  earlier  work  in  this  model  it  is  less  efficient  for  small  fields 
by  a  factor  of  essentially  |~ Yogpk  1  foi'  error  probability  see  Appendix  A. 4  for  details. 

Obtaining  our  result  requires  new  ideas  compared  to  [5],  which  was  previously  state  of  the  art 
and  was  based  on  additive  secret  sharing  where  each  share  in  a  secret  is  authenticated  using  an 
information  theoretic  Message  Authentication  Code  (MAC).  Since  each  player  needs  to  have  his 
own  key,  each  of  the  n  shares  need  to  be  authenticated  with  n  MACs,  so  this  approach  is  inherently 
quadratic  in  n.  Our  idea  is  to  authenticate  the  secret  value  itself  instead  of  the  shares,  using  a 
single  global  key.  This  seems  to  lead  to  a  “chicken  and  egg”  problem  since  one  cannot  check  a 
MAC  without  knowing  the  key,  but  if  the  key  is  known,  MACs  can  be  forged.  Our  solution  to  this 

^  With  dishonest  majority,  successful  termination  cannot  be  guaranteed,  so  our  protocols  simply  abort  if  cheating  is 
detected.  We  do  not,  however,  identify  who  cheated,  indeed  the  standard  definition  of  secure  function  evaluation 
does  not  require  this.  Identification  of  cheaters  is  possible  but  we  do  not  know  how  to  do  this  while  maintaining 
complexity  linear  in  n. 
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involves  secret  sharing  the  key  as  well,  carefully  timing  when  values  are  revealed,  and  various  tricks 
to  reduce  the  amortized  cost  of  checking  a  set  of  MACs. 

Efficient  use  of  FHE  for  MFC.  As  a  conceptual  contribution  we  propose  what  we  believe  is  “the 
right”  way  to  use  FHE/SHE  for  computationally  efficient  MFC,  namely  to  use  it  for  implementing 
a  preprocessing  phase.  The  observation  is  that  since  such  preprocessing  is  typically  based  on  the 
classic  circuit  randomization  technique  of  Beaver  [3] ,  it  can  be  done  by  evaluating  in  parallel  many 
small  circuits  of  small  multiplicative  depth  (in  fact  depth  1  in  our  case).  Thus  SHE  suffices,  we  do 
not  need  bootstrapping,  and  we  can  use  the  SHE  SIMD  approach  of  [28]  to  handle  many  values  in 
parallel  in  a  single  ciphertext. 

To  capitalize  on  this  idea,  we  apply  the  SIMD  approach  to  the  cryptosystem  from  [7]  (see  also 
[16]  where  this  technique  is  also  used).  To  get  the  best  performance,  we  need  to  do  a  non-trivial 
analysis  of  the  parameter  values  we  can  use,  and  we  prove  some  results  on  norms  of  embeddings 
of  a  cyclotomic  field  for  this  purpose.  We  also  design  a  distributed  decryption  procedure  for  our 
cryptosystem.  This  protocol  is  only  robust  against  passive  attacks.  Nevertheless,  this  is  sufficient  for 
the  overall  protocol  to  be  actively  secure.  Intuitively,  this  is  because  the  only  damage  the  adversary 
can  do  is  to  add  a  known  error  term  to  the  decryption  result  obtained.  The  effect  of  this  for  the 
online  protocol  is  that  certain  shares  of  secret  values  may  be  incorrect,  but  this  will  caught  by 
the  check  involving  the  MACs.  Finally  we  adapt  a  zero- knowledge  proof  of  plaintext  knowledge 
from  [5]  for  our  purpose  and  in  particular  we  improve  the  analysis  of  the  soundness  guarantees  it 
offers.  This  inffuences  the  choice  of  parameters  for  the  cryptosystem  and  therefore  improves  overall 
performance. 

An  Efficient  Preprocessing  Protocol.  As  a  result  of  the  above,  we  obtain  a  constant-round  prepro¬ 
cessing  protocol  that  is  UC-secure  against  active  and  static  corruption  of  n  —  1  players  assuming 
the  underlying  cryptosystem  is  semantically  secure,  which  follows  from  the  polynomial  (PLWE) 
assumption.  UC-security  for  dishonest  majority  cannot  be  obtained  without  a  set-up  assumption. 
In  this  paper  we  assume  that  a  key  pair  for  our  cryptosystem  has  been  generated  and  the  secret 
key  has  been  shared  among  the  players. 

Whereas  previous  work  in  the  preprocessing/online  model  [5,13]  use  public-key  opera¬ 

tions  per  secure  multiplication,  we  only  need  0{ffi/s)  operations,  where  s  is  a  number  that  grows 
with  the  security  parameter  of  the  SHE  scheme  (we  have  s  ~  12000  in  our  concrete  instantiation 
for  computing  in  ¥p  where  p  ~  2®^).  We  stress  that  our  adapted  scheme  is  exactly  as  efficient  as  the 
basic  version  of  [7]  that  does  not  allow  this  optimization,  so  the  improvement  is  indeed  “genuine” . 

In  comparison  to  the  approach  mentioned  above  where  one  uses  EHE  throughout  the  protocol, 
our  combined  preprocessing  and  online  phase  achieves  a  result  that  is  incomparable  from  a  theo¬ 
retical  point  of  view,  but  much  more  practical:  we  need  more  communication  and  rounds,  but  the 
computational  overhead  is  much  smaller  -  we  need  Offifjs  •  JCj)  public  key  operations  compared 
to  0{n  •  JCj)  for  the  EHE  approach,  where  for  realistic  values  of  n  and  s,  we  have  ffi/s  <C  n. 
Furthermore,  we  only  need  a  low  depth  SHE  which  is  much  more  efficient  in  the  first  place.  And 
finally,  we  can  push  all  the  work  using  SHE  into  a,  function  independent,  preprocessing  phase. 

Performance  in  practice.  Both  the  preprocessing  and  online  phase  have  been  implemented  and 
tested  for  3  players  on  up-to-date  machines  connected  on  a  LAN.  The  preprocessing  takes  about 
13  ms  amortized  time  to  prepare  one  multiplication  in  Fp  for  a  64-bit  p,  with  security  level  corre¬ 
sponding  roughly  to  1024  bit  RSA  and  an  error  probability  of  2“^®  for  the  zero-knowledge  proofs 
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(the  error  probability  can  be  lowered  to  2“®*^  by  repeating  the  ZK  proofs  which  will  at  most  double 
the  time).  This  is  2-3  orders  of  magnitude  faster  than  preliminary  estimates  for  the  most  efficient 
instantiation  of  [5].  The  online  phase  executes  a  secure  64-bit  multiplication  in  0.05  ms  amortized 
time.  These  rough  orders  of  magnitude,  and  the  ability  to  deal  with  a  non-trivial  number  of  players, 
are  born  out  by  a  recent  implementation  of  the  protocols  described  in  this  paper  [11]. 

Concurrent  Related  Work.  In  recent  independent  work  [24,2,16],  Meyers  at  ah,  Asharov  et  al. 
and  Gentry  et  al.  also  use  an  THE  scheme  for  multiparty  computation.  They  follow  the  pure  FHE 
approach  mentioned  above,  using  a  threshold  decryption  protocol  tailored  to  the  specific  FHE 
scheme.  They  focus  primarily  on  round  complexity,  while  we  want  to  minimize  the  computational 
overhead.  We  note  that  in  [16],  Gentry  et  al.  obtain  small  overhead  by  showing  a  way  to  use  the 
FHE  SIMD  approach  for  computing  any  circuit  homomorphically.  However,  this  requires  full  FHE 
with  bootstrapping  (to  work  on  arbitrary  circuits)  and  does  not  (currently)  lead  to  a  practical 
protocol. 

In  [25],  Nielsen  et  al.  consider  secure  computing  for  Boolean  Gircuits.  Their  online  phase  is 
similar  to  that  of  [5],  while  the  preprocessing  is  a  clever  and  very  efficient  construction  based  on 
Oblivious  Transfer.  This  result  is  complementary  to  ours  in  the  sense  that  we  target  computations 
over  large  fields  which  is  good  for  some  applications  whereas  for  other  cases.  Boolean  Gircuits  are  the 
most  compact  way  to  express  the  desired  computation.  Of  course,  one  could  use  the  preprocessing 
from  [25]  to  set  up  data  for  our  online  phase,  but  current  benchmarks  indicate  that  our  approach 
is  faster  for  large  fields,  say  of  size  64  bits  or  more. 

We  end  the  introduction  by  covering  some  basic  notation  which  will  be  used  throughout  this 
paper.  For  a  vector  x  =  (xi, . . . ,  Xn)  G  K""  we  denote  by  IJxjjoo  :=  maxi<j<„  jx*],  jjxjji  :=  J2i<i<n  1^*1 
and  jjxjj2  :=  \/^  \xi\‘^.  We  let  e(/i)  denote  an  unspecified  negligible  function  of  k.  If  5  is  a  set  we 
let  X  <—  S'  denote  assignment  to  the  variable  x  with  respect  to  a  uniform  distribution  on  S;  we  use 
X  <—  s  for  a  value  s  as  shorthand  for  x  <—  {s}.  If  A  is  an  algorithm  x  <—  A  means  assign  to  x  the 
output  of  A,  where  the  probability  distribution  is  over  the  random  coins  of  A.  Finally  x  :=  y  means 
“x  is  defined  to  be  y” . 

2  Online  Protocol 

Our  aim  is  to  construct  a  protocol  for  arithmetic  multiparty  computation  over  F^fc  for  some  prime 
p.  More  precisely,  we  wish  to  implement  the  ideal  functionality  .FampC;  presented  in  Figure  15  in 
Appendix  Ethe  full  version.  Our  MPG  protocol  is  structured  in  a  preprocessing  (or  offline)  phase 
and  an  online  phase.  We  start  out  in  this  section  by  presenting  the  online  phase  which  assumes 
access  to  an  ideal  functionality  .Ttrep  (Figure  16  of  Appendix  E).  In  Section  5  we  show  how  to 
implement  this  functionality  in  an  independent  preprocessing  phase. 

In  our  specification  of  the  online  protocol,  we  assume  for  simplicity  that  a  broadcast  channel 
is  available  at  unit  cost,  that  each  party  has  only  one  input,  and  only  one  public  output  value 
is  to  be  computed.  In  Appendix  A. 3  we  explain  how  to  implement  the  broadcasts  we  need  from 
point-to-point  channels  and  lift  the  restriction  on  the  number  of  inputs  and  outputs  without  this 
affecting  the  overall  complexity. 

Before  presenting  the  concrete  online  protocol  we  give  the  intuition  and  motivation  behind 
the  construction.  We  will  use  unconditionally  secure  MAGs  to  protect  secret  values  from  being 
manipulated  by  an  active  adversary.  However,  rather  than  authenticating  shares  of  secret  values  as 
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in  [5],  we  authenticate  the  shared  value  itself.  More  concretely,  we  will  use  a  global  key  a  chosen 
randomly  in  F^fc,  and  for  each  secret  value  a,  we  will  share  a  additively  among  the  players,  and  we 
also  secret-share  a  MAC  aa.  This  way  to  represent  secret  values  is  linear,  just  like  the  representation 
in  [5] ,  and  we  can  therefore  do  secure  multiplication  based  on  multiplication  triples  a  la  Beaver  [3] 
that  we  produce  in  the  preprocessing. 

An  immediate  problem  is  that  opening  a  value  reliably  seems  to  require  that  we  check  the  MAC, 
and  this  requires  players  know  a.  However,  as  soon  as  a  is  known,  MACs  on  other  values  can  be 
forged.  We  solve  this  problem  by  postponing  the  check  on  the  MACs  (of  opened  values)  to  the 

output  phase  (of  course,  this  may  mean  that  some  of  the  opened  values  are  incorrect).  During  the 

output  phase  players  generate  a  random  linear  combination  of  both  the  opened  values  and  their 
shares  of  the  corresponding  MACs;  they  commit  to  the  results  and  only  then  open  a  (see  Figure 
1).  The  intuition  is  that,  because  of  the  commitments,  when  a  is  revealed  it  is  too  late  for  corrupt 
players  to  exploit  knowledge  of  the  key.  Therefore,  if  the  MAC  checks  out,  all  opened  values  were 

correct  with  high  probability,  so  we  can  trust  that  the  output  values  we  computed  are  correct  and 

can  safely  open  them. 

Protocol  IIo  NLINE 

Initialize:  The  parties  first  invoke  the  preprocessing  to  get  the  shared  secret  key  |q;],  a  sufficient  number  of 
multiplication  triples  ((a),  {b),  (c)),  and  pairs  of  random  values  (r),  [r],  as  well  as  single  random  values  |t|,  [e]. 
Then  the  steps  below  are  performed  in  sequence  according  to  the  structure  of  the  circuit  to  compute. 

Input:  To  share  Pi's  input  Xi,  Pi  takes  an  available  pair  (r),  [r].  Then,  do  the  following: 

1.  [r]  is  opened  to  Pi  (if  it  is  known  in  advance  that  Pi  will  provide  input,  this  step  can  be  done  already 
in  the  preprocessing  stage). 

2.  Pi  broadcasts  e  ^  Xi  —  r. 

3.  The  parties  compute  (xi)  <—  (r)  +  e. 

Add:  To  add  two  representations  (x),  {y},the  parties  locally  compute  (x)  +  (y). 

Multiply:  To  multiply  (x) ,  (y)  the  parties  do  the  following: 

1.  They  take  two  triples  ((a),  (b),  (c)),  ((/),  (g),  (h))  from  the  set  of  the  available  ones  and  check  that  indeed 
a  ■  b  =  c. 

—  Open  a  representation  of  a  random  value  |t]. 

—  partially  open  t  ■  (a)  —  (/)  to  get  p  and  (6)  —  (g)  to  get  a 

—  evaluate  t  ■  (c)  —  (h)  —  a  ■  (/)  —  p  ■  (g)  —  a  ■  p,  and  partially  open  the  result. 

—  If  the  result  is  not  zero  the  players  abort,  otherwise  go  on  with  (a),  (6),  (c). 

Note  that  this  check  could  in  fact  be  done  as  part  of  the  preprocessing.  Moreover,  it  can  be  done  for  all 
triples  in  parallel,  and  so  we  actually  need  only  one  random  value  t. 

2.  The  parties  partially  open  (x)  —  (a)  to  get  e  and  (y)  —  (b)  to  get  S  and  compute  (z)  <—  (c)  +e(&)  +(5(a)  +e5 

Output:  We  enter  this  stage  when  the  players  have  (y)  for  the  output  value  y,  but  this  value  has  been  not  been 

opened  (the  output  value  is  only  correct  if  players  have  behaved  honestly).  We  then  do  the  following: 

1.  Let  fli, . . . ,  Ot  be  all  values  publicly  opened  so  far,  where  (aj)  =  (Sj,  (aj,i, . . . ,  aj,n),  (7(aj)i, . . . ,  'y(aj)„)). 

Now,  a  random  value  [e]  is  opened,  and  players  set  a  =  e‘  for  i  =  All  players  compute 

a,  ^^2,  j  • 

2.  Each  Pi  calls  J-qom  to  commit  to  7i  ^  22 j  For  the  output  value  (y),  Pi  also  commits  to  his 

share  yt,  and  his  share  'y{y)i  in  the  corresponding  MAC. 

3.  [a|  is  opened. 

4.  Each  Pi  asks  Tcom  to  open  7^,  and  all  players  check  that  a{a  +  22^  ^j^j)  =  22i  7i-  T  this  is  not  OK,  the 
protocol  aborts.  Otherwise  the  players  conclude  that  the  output  value  is  correctly  computed. 

5.  To  get  the  output  value  y,  the  commitments  to  yi,  'y{y)i  are  opened.  Now,  y  is  defined  as  y  :=  22i  Vi  ^^^1 

each  player  checks  that  a{jj  +  <5)  =  22i  F  so,  y  is  the  output. 

Fig.  1.  The  online  phase. 
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Representation  of  values  and  MACs.  In  the  online  phase  each  shared  value  a  G  is  represented 
as  follows 

(a)  :=  (5,  (ai,...,an),(7(a)i,...,7(a)n)) 

where  a  =  ai  +  •  •  •  +  On  and  7(0)1  +  •  •  •  +7(a)n  =  a(a  +  d).  Player  Pi  holds  a*,  7(0)1  and  5  is  public. 
The  interpretation  is  that  7(0)  <—  7(0)1  +  •  •  •  +7(0)^  is  the  MAC  authenticating  a  under  the  global 
key  a. 

Computations.  Using  the  natural  component-wise  addition  of  representations,  and  suppressing 
the  underlying  choices  of  ai,^{a)i  for  readability,  we  clearly  have  for  secret  values  a,b  and  public 
constant  e  that 


(o)  -|-  (6)  =  (o  -|-  6)  e  •  (o)  =  (eo),  and  e  -|-  (o)  =  (e  -|-  o), 

where  e+  (a)  :=  (d  —  e,  (oi-|-e,  02, . . . ,  o^),  (7(0)1, . . . ,  7(0)^)).  This  possibility  to  easily  add  a  public 
value  is  the  reason  for  the  “public  modifier”  d  in  the  definition  of  (•).  It  is  now  clear  that  we  can 
do  secure  linear  computations  directly  on  values  represented  this  way. 

What  remains  is  multiplications:  here  we  use  the  preprocessing.  We  would  like  the  preprocessing 
to  output  random  triples  (o),(6),(c),  where  c  =  ab.  However,  our  preprocessing  produces  triples 
which  satisfy  c  =  ab  +  A,  where  A  is  an  error  that  can  be  introduced  by  the  adversary.  We  therefore 
need  to  check  the  triple  before  we  use  it.  The  check  can  be  done  by  “sacrificing”  another  triple 
(/),  (g),  (h),  where  the  same  multiplicative  equality  should  hold  (see  the  protocol  for  details).  Given 
such  a  valid  triple,  we  can  do  multiplications  in  the  following  standard  way:  To  compute  {xy)  we 
first  open  (x)  —  (a)  to  get  e,  and  (y)  —  (6)  to  get  d.  Then  xy  =  {a  +  €){b  +  d)  =  c  +  eb  +  da  +  ed. 
Thus,  the  new  representation  can  be  computed  as 

(x)  ■  {y)  =  (c)  -h  e(6)  -h  d{a)  +  ed. 

An  important  note  is  that  during  our  protocol  we  are  actually  not  guaranteed  that  we  are 
working  with  the  correct  results,  since  we  do  not  immediately  check  the  MACs  of  the  opened 
values.  During  the  first  part  of  the  protocol,  parties  will  only  do  what  we  define  as  a  partial 
opening.,  meaning  that  for  a  value  (a),  each  party  Pi  sends  Oj  to  Pi,  who  computes  a  =  ai  +  ■  ■  ■  +  On 
and  broadcasts  a  to  all  players.  We  assume  here  for  simplicity  that  we  always  go  via  Pi,  whereas 
in  practice,  one  would  balance  the  workload  over  the  players. 

As  sketched  earlier  we  postpone  the  checking  to  the  end  of  the  protocol  in  the  output  phase. 
To  check  the  MACs  we  need  the  global  key  a.  We  get  a  from  the  preprocessing  but  in  a  slightly 
different  representation: 

[aj  :=  ((ai, . . . ,  a^),  {Pi,j{a)\, ...,  7(a)n)i=i,...,n)), 

where  a  =  Yli^i  Player  Pi  holds  ai,  Pi,^{a)\, . . .  ,'y{ay^.  The  idea  is  that 

7(a)i  <—  is  the  MAC  authenticating  a  under  Pfs  private  key  (3i.  To  open  [a]  each  Pj 

sends  to  each  Pi  his  share  aj  of  a  and  his  share  j(a)i  of  the  MAC  on  a  made  with  Pfs  private 
key  and  then  Pi  checks  that  Xlj  7(®)i  =  CP®  open  the  value  to  only  one  party  Pi,  the  other 
parties  will  simply  send  their  shares  only  to  Pi,  who  will  do  the  checking.  Only  shares  of  a  and  af3i 
are  needed.) 

Finally,  the  preprocessing  will  also  output  n  pairs  of  a  random  value  r  in  both  of  the  presented 
representations  (r),  [r].  These  pairs  are  used  in  the  Input  phase  of  the  protocol. 
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The  full  protocol  for  the  online  phase  is  shown  in  Figure  1.  It  assumes  access  to  a  commitment 
functionality  ^com  that  simply  receives  values  to  commit  to  from  players,  stores  them  and  reveals 
a  value  to  all  players  on  request  from  the  committer.  Such  a  functionality  could  be  implemented 
efficiently  based,  e.g.,  on  Paillier  encryption  or  the  DDH  assumption  [12,19].  However,  we  show 
in  Appendix  A. 3  that  we  can  do  ideal  commitments  based  only  on  .Tprep  and  with  cost  O(n^) 
computation  and  communication. 

Complexity.  The  (amortized)  cost  of  a  secure  multiplication  is  easily  seen  to  be  0{n)  local  elemen¬ 
tary  operations  in  F^fe,  and  communication  of  0{n)  field  elements.  Linear  operations  have  the  same 
computational  cost  but  require  no  communication.  The  input  stage  requires  0{n)  communication 
and  computation  to  open  [r]  to  Pi  and  one  broadcast.  Doing  the  output  stage  requires  opening 
0(n)  commitments.  In  fact,  the  total  number  of  commitments  used  is  also  0(n),  so  this  adds  an 
O(n^)  term  to  the  complexity.  In  total,  we  therefore  get  the  complexity  claimed  in  the  introduction: 
0(n  •  jCj  -|-  n^)  elementary  field  operations  and  storage/communication  complexity  0(n  •  jCj  -|-  n^) 
field  elements. 

We  can  now  state  the  theorem  on  security  of  the  online  phase,  and  its  proof  is  in  Appendix  A. 3. 

Theorem  1.  In  the  p-p-^-EViiPcom-hybrid  model,  the  protocol  Honline  implements  Pamfc  with  sta- 
tistieal  seeurity  against  any  static^  aetive  adversary  eorrupting  up  to  n  —  1  parties. 

Based  on  a  result  from  [29],  we  can  also  show  a  lower  bound  on  the  amount  of  preprocessing 
data  and  work  required  for  a  protocol.  The  proof  is  in  Appendix  B. 

Theorem  2.  Assume  a  protoeol  vr  is  the  preproeessing  model  ean  eompute  any  eireuit  over^^k  of 
size  at  most  S,  with  seeurity  against  aetive  eorruption  of  at  most  n  —  I  players.  We  assume  that 
the  players  supply  roughly  the  same  number  of  inputs  (0{S/n)  eaeh),  and  that  any  any  player  may 
reeeive  output.  Then  the  preproeessing  must  output  17(5  log p^)  bits  to  eaeh  player,  and  for  any 
player  Pi,  there  exists  a  eireuit  C  satisfying  the  eonditions  above,  where  seeure  eomputation  of  C 
requires  Pi  to  exeeute  an  expeeted  number  of  bit  operations  that  is  l7(51ogp^). 

It  is  easy  to  see  that  our  protocol  satisfies  the  conditions  in  the  the  theorem  and  that  it  meets  the 
first  bound  up  to  a  constant  factor  and  the  second  up  to  a  poly-logarithmic  factor  (as  a  function 
of  the  security  parameter). 

3  The  Abstract  Somewhat  Homomorphic  Encryption  Scheme 

In  this  section  we  specify  the  abstract  properties  we  need  for  our  cryptosystem.  A  concrete  instan¬ 
tiation  is  found  in  Section  6. 

We  first  define  the  plaintext  space  M.  This  will  be  given  by  a  direct  product  of  finite  fields 
(Fpfe)®  of  characteristic  p.  Componentwise  addition  and  multiplication  of  elements  in  M  will  be 
denoted  by  -|-  and  •.  We  assume  there  is  an  injective  encoding  function  encode  which  takes  elements 
in  (Fpfc)^  to  elements  in  a  ring  R  which  is  equal  (as  a  Z- module)  for  some  integer  N.  We  also 
assume  a  decode  function  which  takes  arbitrary  elements  in  Z^  and  returns  an  element  in  (Fpk)^. 
We  require  that  for  all  m  G  M  that  decode(encode(m))  =  m  and  that  the  decode  operation  is 
compatible  with  the  characteristic  of  the  field,  i.e.  for  any  x  G  Z^  we  have  decode(x)  =  decode(x 

*  The  protocol  is  in  fact  adaptively  secure,  here  we  only  show  static  security  since  our  preprocessing  is  anyway  only 
statically  secure. 


228 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


(mod  p)).  And  finally  that  the  encoding  function  produces  “short”  vectors.  More  precisely,  that  for 
all  m  G  ||encode(m) ||oo  <  t  where  r  =  p/2. 

The  two  operations  in  R  will  be  denoted  by  +  and  •.  The  addition  operation  in  R  is  assumed 
to  be  componentwise  addition,  whereas  we  make  no  assumption  on  multiplication.  All  we  require 
is  that  the  following  properties  hold,  for  all  elements  mi, m2  G  M; 

decode(encode(mi)  +  encode(m2))  =  mi  +  m2, 
decode(encode(mi)  •  encode(m2))  =  mi  •  m2. 

From  now  on,  when  we  discuss  the  plaintext  space  M  we  assume  it  comes  implicitly  with  the  encode 
and  decode  functions  for  some  integer  N.  If  an  element  in  M  has  the  same  component  in  each  of 
the  s-slots,  then  we  call  it  a  “diagonal”  element.  We  let  Diag(x)  for  x  G  F^fe  denote  the  element 
(x,x,...,x)  G  (Fpfc)*. 

Our  cryptosystem  consists  of  a  tuple  (ParamGen,  KeyGen,  KeyGen*,  Enc,  Dec)  of  algorithms  de¬ 
fined  below,  and  parametrized  by  a  security  parameter  k. 

ParamGen(l'^,  M):  This  parameter  generation  algorithm  outputs  an  integer  N  (as  above),  definitions 
of  the  encode  and  decode  functions,  and  a  description  of  a  randomized  algorithm  Dp,  which  outputs 
vectors  in  Z^.  We  assume  that  outputs  r  with  ||r||oo  <  p,  except  with  negligible  probability. 
The  algorithm  is  used  by  the  encryption  algorithm  to  select  the  random  coins  needed  during 
encryption.  The  algorithm  ParamGen  also  outputs  an  additive  abelian  group  G.  The  group  G  also 
possesses  a  (not  necessarily  closed)  multiplicative  operator,  which  is  commutative  and  distributes 
over  the  additive  group  of  G.  The  group  G  is  the  group  in  which  the  ciphertexts  will  be  assumed  to 
lie.  We  write  ffl  and  Kl  for  the  operations  on  G,  and  extend  these  in  the  natural  way  to  vectors  and 
matrices  of  elements  of  G.  Finally  ParamGen  outputs  a  set  G  of  allowable  arithmetic  SIMD  circuits 
over  (Fpfe)^,  these  are  the  set  of  functions  which  our  scheme  will  be  able  to  evaluate  ciphertexts 
over.  We  can  think  of  C  as  a  subset  of  Fpfc[Ai,  A2, . . .  ,Xn],  where  we  evaluate  a  function  /  G 
Fpfc[Ai,  A2, . . .  ,Xn]  a  total  of  s  times  in  parallel  on  inputs  from  (F^fe)'^.  We  assume  that  all  other 
algorithms  take  as  implicit  input  the  output  P  <—  (1'^,  A^,  encode,  decode,  Dp,G,G)  of  ParamGen. 
KeyGen():  This  algorithm  outputs  a  public  key  pk  and  a  secret  key  sk. 

EnCpk(x,  r):  On  input  of  x  G  Z'^,  r  G  Z'^,  this  deterministic  algorithm  outputs  a  ciphertext  cG  G. 
When  applying  this  algorithm  one  would  obtain  x  from  the  application  of  the  encode  function, 
and  r  by  calling  D^.  This  is  what  we  mean  when  we  write  EnCpk(m),  where  m  G  M.  However, 
it  is  convenient  for  us  to  define  Enc  on  the  intermediate  state,  x  =  encode(m).  To  ease  notation 
we  write  EnCpk(x)  if  the  value  of  the  randomness  r  is  not  important  for  our  discussion.  To  make 
our  zero-knowledge  proofs  below  work,  we  will  require  that  addition  of  V  “clean”  ciphertexts  (for 
“small”  values  of  V),  of  plaintext  Xj  in  Z^,  using  randomness  r^,  results  in  a  ciphertext  which 
could  be  obtained  by  adding  the  plaintexts  and  randomness,  as  integer  vectors,  and  then  applying 
Encpk(x,r),  i.e. 


Encpk(xi  H - h  xy,  ri  H - h  ry)  =  EnCpk(xi,  ri)  ffl  •  •  •  ffl  EnCpk(xy,  ry). 

DeCsk(c):  On  input  the  secret  key  and  a  ciphertext  c  it  returns  either  an  element  m  G  M,  or  the 
symbol  T. 

We  are  now  able  to  define  various  properties  of  the  above  abstract  scheme  that  we  will  require.  But 
first  a  bit  of  notation:  For  a  function  f  G  C  we  let  n{f)  denote  the  number  of  variables  in  /,  and  we 

229 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


let  /  denote  the  function  on  G  induced  by  /.  That  is,  given  /,  we  replace  every  +  operation  with  a 
ffl,  every  •  operation  is  replaced  with  a  Kl  and  every  constant  c  is  replaced  by  EnCpk(encode(c), 0). 
Also,  given  a  set  of  n(/)  vectors  xi, . . .  we  define  /(xi, . . .  ,x„(j))  in  the  natural  way  by 

applying  /  in  parallel  on  each  coordinate. 

Correctness:  Intuitively  correctness  means  that  if  one  decrypts  the  result  of  a  function  /  G  C 
applied  to  n(/)  encrypted  vectors  of  variables,  then  this  should  return  the  same  value  as  applying 
the  function  to  the  n(/)  plaintexts.  However,  to  apply  the  scheme  in  our  protocol,  we  need  to 
be  a  bit  more  liberal,  namely  the  decryption  result  should  be  correct,  even  if  the  ciphertexts  we 
start  from  were  not  necessarily  generated  by  the  normal  encryption  algorithm.  They  only  need  to 
“contain”  encodings  and  randomness  that  are  not  too  large,  such  that  the  encodings  decode  to  legal 
values.  Formally,  the  scheme  is  said  to  be  {Bpiain,  Brand,  C)-coiiect  if 

Pr  [P  ^  ParamGen(l'',  M),  (pk,  sk)  ^  KeyGen(),  for  any  /  G  C, 
any  Xi,rj,  with  || 

1 1  oo  ^  bpiain  i  1 1  1 1  cx)  ^  Brand,  decode(xj)  G  (F^k^, 
i  =  1,..  .,n(f),  and  c*  ^  EnCpk(xj,  r*),  c  ^  /(ci, . .  •  : 

Decsk(c)  /  /(decode(xi), . . . ,  decode(x„(j)))  ]  <  e(K). 

We  will  say  that  a  ciphertext  is  {Bpiain,  Brand,  C)-admissible  if  it  can  be  obtained  as  the  ciphertext 
c  in  the  above  experiment,  i.e.,  by  applying  a  function  from  C  to  ciphertexts  generated  from  (legal) 
encodings  and  randomness  that  are  bounded  by  Bpiain  and  Brand- 

KeyGen*():  This  is  a  randomized  algorithm  that  outputs  a  meaningless  public  key  pk.  We  require 
that  an  encryption  of  any  message  Enc^(x)  is  statistically  indistinguishable  from  an  encryption  of  0. 

Furthermore,  if  we  set  (pk,  sk)  <—  KeyGen()  and  pk  ^  KeyGen*(),  then  pk  and  pk  are  computationally 
indistinguishable.  This  implies  the  scheme  is  IND-CPA  secure  in  the  usual  sense. 

Distributed  Decryption:  We  assume,  as  a  set  up  assumption,  that  a  common  public  key  has  been 
set  up  where  the  secret  key  has  been  secret-shared  among  the  players  in  such  a  way  that  they  can 
collaborate  to  decrypt  a  ciphertext.  We  assume  throughout  that  only  {Bpiain,  Brand,  C)-admissihle 
ciphertexts  are  to  be  decrypted,  this  constraint  is  guaranteed  by  our  main  protocol. 

We  note  that  some  set-up  assumption  is  always  required  to  show  UC  security  which  is  our  goal 
here.  Concretely,  we  assume  that  a  functionality  .TkeyGen  is  available,  as  specified  in  Figure  2.  It 
basically  generates  a  key  pair  and  secret-shares  the  secret  key  among  the  players  using  a  secret¬ 
sharing  scheme  that  is  assumed  to  be  given  as  part  of  the  specification  of  the  cryptosystem.  Since 
we  want  to  allow  corruption  of  all  but  one  player,  the  maximal  unqualified  sets  must  be  all  sets  of 
n  —  1  players. 


Functionality  TkeyGen 

1.  When  receiving  “start”  from  all  honest  players,  run  P  ^  ParamGen(l'‘,  M),  and  then,  using  the  parameters 
generated,  run  (pk,  sk)  ^  KeyGen()  (recall  P,  and  hence  D,  is  an  implicit  input  to  all  functions  we  specify). 
Send  pk  to  the  adversary. 

2.  We  assume  a  secret  sharing  scheme  is  given  with  which  sk  can  be  secret-shared.  Receive  from  the  adversary 
a  set  of  shares  Sj  for  each  corrupted  player  Pj. 

3.  Construct  a  complete  set  of  shares  (si, . . . ,  s„)  consistent  with  the  adversary’s  choices  and  sk.  Note  that  this 
is  always  possible  since  the  corrupted  players  form  an  unqualified  set.  Send  pk  to  all  players  and  Si  to  each 
honest  Pi. 

Fig.  2.  The  Ideal  Functionality  for  Distributed  Key  Generation 
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We  note  that  it  is  possible  to  make  a  weaker  set-up  assumption,  such  as  a  common  reference 
string  (CRS),  and  using  a  general  UC  secure  multiparty  computation  protocol  for  the  CRS  model 
to  implement  ^keyGen-  While  this  may  not  be  very  efficient,  one  only  needs  to  run  this  protocol 
once  in  the  life-time  of  the  system. 

We  also  want  our  cryptosystem  to  implement  the  functionality  .^keyGenDec  in  Figure  3,  which 
essentially  specifies  that  players  can  cooperate  to  decrypt  a  {Bpiain,  Brand,  C')-admissible  ciphertext, 
but  the  protocol  is  only  secure  against  a  passive  attack:  the  adversary  gets  the  correct  decryption 
result,  but  can  decide  which  result  the  honest  players  should  learn. 


Functionality  .T^keyGenDec 

1.  When  receiving  “start”  from  all  honest  players,  run  ParamGen(R,  M),  and  then,  using  the  parameters 
generated,  run  (pk,  sk)  ^  KeyGen().  Send  pk  to  the  adversary  and  to  all  players,  and  store  sk. 

2.  Hereafter  on  receiving  “decrypt  c”  for  (Hpjain,  Brand,  Cf-admissible  c  from  all  honest  players,  send  c  and 
m  <—  DeCsk(c)  to  the  adversary.  On  receiving  m'  from  the  adversary,  send  “Result  m'”  to  all  players,  Both 
m  and  m'  may  be  a  special  symbol  _L  indicating  that  decryption  failed. 

3.  On  receiving  “decrypt  c  to  Pj”  for  admissible  c,  if  Pj  is  corrupt,  send  c,m  Decsk(c)  to  the  adversary.  If 
Pj  is  honest,  send  c  to  the  adversary.  On  receiving  5  from  the  adversary,  if  <5  ^  M,  send  _L  to  Pj,  if  5  G  M, 
send  Decsk(c)  -|-  <5  to  Pj. 

Fig.  3.  The  Ideal  Functionality  for  Distributed  Key  Generation  and  Decryption 


We  are  now  finally  ready  to  define  the  basic  set  of  properties  that  the  underlying  cryptosystem 
should  satisfy,  in  order  to  be  used  in  our  protocol.  Here  we  use  an  “information  theoretic”  security 
parameter  sec  that  controls  the  errors  in  our  ZK  proofs  below. 

Definition  1.  (Admissible  Cryptosystem.)  Let  C  contain  formulas  of  form  {xi  -|-  •  •  •  -|-  Xn)  ■ 
(yi  +  •  •  •  +  Vn)  +  zi  +  ■  ■  ■  +  Zn,  as  well  as  all  “smaller”  formulas  ,  i.e.,  with  a  smaller  number  of 
additions  and  possibly  no  multiplication.  A  cryptosystem  is  admissible  if  it  is  defined  by  algorithms 
(ParamGen,  KeyGen,  KeyGen*,  Enc,  Dec)  with  properties  as  defined  above,  is  {Bpiain,  Brand,  C) -correct, 
where 

Bpiain  =  N  -T-  sec^  •  2(l/2+i^)sec^  =  d-  p-  SeC^  •  2F/2+!^)sec. 

and  where  n  >  0  can  be  an  arbitrary  constant.  Finally  there  exist  a  secret  sharing  scheme  as 
required  in  J^keyGen  and  a  protocol  flKeyCenDec  with  the  property  that  when  composed  with  .FkeyGen 
it  securely  implements  the  functionality  .FkeyGenDec- 

The  set  C  is  defined  to  contain  all  computations  on  ciphertext  that  we  need  in  our  main  protocol. 
Throughout  the  paper  we  will  assume  that  Bpiain,  Brand  are  defined  as  here  in  terms  of  r,  p  and  sec. 
This  is  because  these  are  the  bounds  we  can  force  corrupt  players  to  respect  via  our  zero-knowledge 
protocol,  as  we  shall  see. 

4  Zero-Knowledge  Proof  of  Plaintext  Knowledge 

This  section  presents  a  zero-knowledge  protocol  that  takes  as  input  sec  ciphertexts  ci , . . . ,  Csec 
generated  by  one  of  the  players  in  our  protocol,  who  will  act  as  the  prover.  If  the  prover  is  honest 
then  Ci  =  EnCpk(xj,  r^),  where  Xj  has  been  obtained  from  the  encode  function,  i.e.  ||xj||oo  <  t,  and  r^ 
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has  been  generated  from  (so  we  may  assume  that  ||ri||oo  <  p)-  Our  protocol  is  a  zero-knowledge 
proof  of  plaintext  knowledge  (ZKPoPK)  for  the  following  relation: 

^PoPK  =  {  {x,w)\  X  =  (pk,c),  w  =  ((xi,ri),. . .  ,(xsec,rsec))  : 

C  —  (ci ,  .  .  .  ,  Csec) )  Cj  <  EnCp(<(xj,  Tj) , 

||xj||oo  ^  decode(xj)  G  (IFp*;)  ,  ||ri||oo  ^  Brand  }  • 

The  zero- knowledge  and  completeness  properties  hold  only  if  the  ciphertexts  Cj  satisfy  ||xj||oo  <  t 
and  ||ri||oo  <  P- 

In  our  preprocessing  protocol,  players  will  be  required  to  give  such  a  ZKPoPK  for  all  ciphertexts 
they  provide.  By  admissibility  of  the  cryptosystem,  this  will  imply  that  every  ciphertext  occurring 
in  the  protocol  will  be  {Bpiain,  Brand,  C)-admissihle  and  can  therefore  be  decrypted  correctly.  The 
ZKPoPK  can  also  be  called  with  a  flag  diag  which  will  modify  the  proof  so  that  it  additionally 
proves  that  decode(xj)  is  a  diagonal  element. 

The  protocol  is  not  meant  to  implement  an  ideal  functionality,  but  we  can  still  use  it  and  prove 
UC  security  for  the  main  protocol,  since  we  will  always  generate  the  challenge  e  by  calling  the  .Trand 
ideal  functionality  (see  Appendix  E).  Hence  the  honest- verifier  ZK  property  implies  straight-line 
simulation^.  As  for  knowledge  extraction,  the  UC  simulator  we  construct  in  our  security  proof  will 
know  the  secret  key  for  the  cryptosystem  and  can  therefore  extract  a  dishonest  prover’s  witness 
simply  by  decrypting.  In  the  reduction  to  show  that  the  simulator  works,  we  do  not  know  the  secret 
key,  but  here  we  are  allowed  to  do  extraction  by  rewinding. 

The  protocol  and  its  proof  of  security  are  given  in  Appendix  A.l,  Figure  9  and  its  computational 
complexity  per  ciphertext  is  essentially  the  cost  of  a  constant  number  of  encryptions.  In  Appendix 
A.l,  we  also  give  a  variant  of  the  ZK  proof  that  allows  even  smaller  values  for  B plain,  Brand,  namely 
Bpiain  =  N  -  T  ■  sec^  •  Brand  =  d  -  p  -  sec^  •  and  hence  improves  performance  further. 

This  variant  is  most  efficient  when  executed  using  the  Fiat-Shamir  heuristic  (although  it  can  also 
work  without  random  oracles),  and  we  believe  this  variant  is  the  best  for  a  practical  implementation. 

5  The  Preprocessing  Phase 

In  this  section  we  construct  the  protocol  Hprep  which  securely  implements  the  functionality  .Tprep 
(specified  in  Figure  16)  in  the  presence  of  functionalities  .TkeyGenDec  (Figure  3)  and  Jitand  (Figure 
14).  The  preprocessing  uses  the  above  abstract  cryptosystem  with  M  =  (F^fc)^,  but  the  online 
phase  is  designed  for  messages  in  F^fe.  Therefore,  we  extend  the  notation  (•)  and  [•]  to  messages  in 
M:  since  addition  and  multiplication  on  M  are  componentwise,  for  m  =  (mi, . . .  ,ms),  we  define 
(m)  =  ((mi), . . . ,  (ms))  and  similarly  for  [m].  Conversely,  once  a  representation  (or  a  pair,  triple) 
on  vectors  is  produced  in  the  preprocessing,  it  will  be  disassembled  into  its  coordinates,  so  that  it 
can  be  used  in  the  online  phase.  In  Figures  4,5  and  6,  we  introduce  subprotocols  that  are  accessed 
by  the  main  preprocessing  protocol  in  several  steps.  Note  that  the  subprotocols  are  not  meant  to 
implement  ideal  functionalities:  their  purpose  is  merely  to  summarize  parts  of  the  main  protocol 
that  are  repeated  in  various  occasions.  Theorem  3  below  is  proved  in  Appendix  A. 5. 


®  .Trand  can  be  implemented  by  standard  methods,  and  the  complexity  of  this  is  not  significant  for  the  main  protocol 
since  we  may  use  the  same  challenge  for  many  instances  of  the  proof,  and  each  proof  handles  sec  ciphertexts. 
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Theorem  3.  T/ie  protocol  IIprep  (Figure  7)  implements  iPp^iEP  with  eomputational  seeurity  against 
any  statie,  aetive  adversary  eorrupting  up  ton—1  parties,  in  the  ^keyGen;  ^KAm-hyhrid  model  when 
the  underlying  eryptosystem  is  admissible^ . 


Protocol  Reshare 

Usage:  Input  is  Cm,  where  em  =  EnCpk(m)  is  a  public  ciphertext  and  a  parameter  enc,  where  enc  = 
NewCiphertext  or  enc  =  NoNewCiphertext.  Output  is  a  share  rrii  of  m  to  each  player  Pi',  and  if  enc  = 
NewCiphertext,  a  ciphertext  e^.  The  idea  is  that  em  could  be  a  product  of  two  ciphertexts,  which  Reshare 
converts  to  a  “fresh”  ciphertext  Cm-  Since  Reshare  uses  distributed  decryption  (that  may  return  an  incorrect 
result),  it  is  not  guaranteed  that  em  and  em  contain  the  same  value,  but  it  is  guaranteed  that  rm  is  the 
value  contained  in  e'^. 

Reshare(em,  enc)  : 

1.  Each  player  Pi  samples  a  uniform  £  (F^k)'’.  Define  f  fr- 

2.  Each  player  Pi  computes  and  broadcasts  Cf.  ^  EnCpk(fi). 

3.  Each  player  Pi  runs  IIzkpopk  acting  as  a  prover  on  er- .  The  protocol  aborts  if  any  proof  fails. 

4.  The  players  compute  Cf  <—  Cf^  ffl  •  •  •  ffl  ef„ ,  and  Cm+f  ^  Cm  EB  er . 

5.  The  players  invoke  JFkeyGenDec  to  decrypt  Cm+f  and  thereby  obtain  m  +  f . 

6.  Pi  sets  mi  ^  m  +  f  —  fi,  and  each  player  Pi  (i  7^  1)  sets  m;  « - fi. 

7.  If  enc  —  NewCiphertext,  all  players  set  Cm  ^  EnCpk;(m  +  f)  B  ef^  B  •  •  •  B  ef„,  where  a  default  value  for 
the  randomness  is  used  when  computing  EnCpfc(m  +  f). 

Fig.  4.  The  sub-protocol  for  additively  secret  sharing  a  plaintext  m  £  (Fpk)'*  on  input  a  ciphertext  Cm  =  EnCpk(m). 


Protocol  PBracket 

Usage:  On  input  shares  Vi, . . . ,  v„  privately  held  by  the  players  and  public  ciphertext  e^,  this  protocol  generates 
[v].  It  is  assumed  that  vi  is  the  plaintext  contained  in  Cv 
PBracket(vi, . . . ,  v„,  Cv)  : 

1.  For  i  =  1, . . . ,  n 

(a)  All  players  set  e^^  <—  C/s,  Kl  Cv  (note  that  C/j.  is  generated  during  the  initialization  process,  and 
known  by  every  player) 

(b)  Players  generate  (7^ , . .  .7”)  <—  Reshare(e-yi ,  NoNewCiphertext),  so  each  player  Pj  gets  a  share  'y(  of 
V  •  (5i. 

2.  Output  the  representation  [v|  =  (vi, . . . ,  v„,  (/Ii,7(, . . . ,  7),)i=i,. ..,„)• 

Fig.  5.  The  sub- protocol  for  generating  [v]. 


Protocol  PAngle 

Usage:  On  input  shares  vi, . . . ,  v„  privately  held  by  the  players  and  public  ciphertext  Cv,  this  protocol  generates 
(v).  It  is  assumed  that  Vi  is  the  plaintext  contained  in  Cv 
PAngle(vi, . . . ,  Vn,  Cv)  : 

1.  All  players  set  Cvo  <—  Cv  B  (note  that  Ca  is  generated  during  the  initialization  process,  and  known 
by  every  player) 

2.  Players  generate  (71, . . . ,  7„)  «—  Reshare(ev  a,  NoNewCiphertext),  so  each  player  Pi  gets  a  share  7i  of  a  -  v. 

3.  Output  representation  (v)  =  (0,  vi, . . . ,  v„,7i, . . .  ,7n). 

Fig.  6.  The  sub-protocol  for  generating  (v). 


®  The  definition  of  admissible  cryptosystem  demands  a  decryption  protocol  that  implements  .FkeyGenDec  based  on 
.FkeyGen,  hence  the  theorem  only  assumes  JFkeyGen- 


233 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


Protocol  ripHEP 

Usage:  The  Triple-step  is  always  executed  sec  times  in  parallel.  This  ensures  that  when  calling  IIzkpopk,  we  can 
always  give  it  the  sec  ciphertexts  it  requires  as  input.  In  addition  both  Hzkpopk  and  IIprep  can  be  executed 
in  a  SIMD  fashion,  i.e.  they  are  data-oblivious  bar  when  they  detect  an  error.  Thus  we  can  execute  IIzkpopk 
and  IIpREP  on  the  packed  plaintext  space  (F^fc)®.  Thereby,  we  generate  s  ■  sec  elements  in  one  go  and  then 
buffer  the  generated  triples,  outputting  the  next  unused  one  on  demand. 

Initialize:  This  step  generates  the  global  key  a  and  “personal  keys”  f3i. 

1.  The  players  call  “start”  on  JFkeyGenDeo  to  obtain  the  public  key  pk 

2.  Each  player  Pi  generates  a  M AC-key  [3i  G 

3.  Each  player  Pi  generates  Oi  €  F^t.  Let  a  := 

4.  Each  player  Pi  computes  and  broadcasts  Ca;  <—  EnCpk(Diag(ai)),  Cfj-  <—  EnCpk(Diag(/3i)) 

5.  Each  player  Pi  invokes  IIzkpopk  (with  diag  set  to  true)  acting  as  prover  on  input  (ca;, . . . ,  ea)  and  on 
input  (e/3;, . . . ,  6/3;),  where  60^,613 ■  are  repeated  sec  times,  which  is  the  number  of  ciphertexts  IIzkpopk 
requires  as  input.  (This  is  not  very  efficient,  but  only  needs  to  be  done  once  for  each  player.) 

6.  All  players  compute  Ca  <—  e^i  EB  •••  ffl  ea„,  and  generate  [Diag(a)|  ^ 
PBracket(Diag(ai), . . . ,  Diag(Q;n),  e^) 

Pair:  This  step  generates  a  pair  |r],  (r),  and  can  be  used  to  generate  a  single  value  [r],  by  not  performing  the 
call  to  Pangle 

1.  Each  player  Pi  generates  r/  £  (F^k)®.  Let  r  :=  J]r=i 

2.  Each  player  Pi  computes  and  broadcasts  er;  <—  EnCpk(ri).  Let  Cr  =  Cn  EB  •  •  •  EB  er„ 

3.  Each  player  Pi  invokes  IIzkpopk  acting  as  prover  on  the  ciphertext  he  generated 

4.  Players  generate  [r]  ^  PBracket(ri, . . . ,  r„,  er),  (r)  ^  PAngle(ri, . . . ,  r„,  er) 

Triple:  This  step  generates  a  multiplicative  triple  (a),  (b),  (c) 

1.  Each  player  Pi  generates  a/,  b;  £  (F^^)®.  Let  a  b  bi 

2.  Each  player  Pi  computes  and  broadcasts  Ca;  <—  EnCpk(a/),  eb;  ^  EnCpk(bi) 

3.  Each  player  Pi  invokes  IIzkpopk  acting  as  prover  on  the  ciphertexts  he  generated. 

4.  The  players  set  Ca  ^  Ca;  EB  •  •  •  ffl  ea„  and  Cb  ^  ebi  ffl  •  •  •  ffl  eb„ 

5.  Players  generate  (a)  ^  PAngle(ai, . . . ,  a„,  Ca),  (b)  ^  PAngle(bi, . . .  ,  b„,  eb). 

6.  All  players  compute  ec  ^  Ca  Kl  Cb 

7.  Players  set  (ci, . . .  ,c„,e(.)  ^  Reshare(ec,  NewCiphertext). 

8.  Players  generate  (c)  ^  PAngle(ci, . . . ,  c„,  e^). 

Fig.  7.  The  protocol  for  constructing  the  global  key  [a|,  pairs  [r],  (r)  and  multiplicative  triples  (a),  (b),  (c). 


6  Concrete  Instantiation  of  the  Abstract  Scheme  based  on  LWE 

We  now  describe  the  concrete  scheme,  which  is  based  on  the  somewhat  homomorphic  encryption 
scheme  of  Brakerski  and  Vaikuntanathan  (BV)  [7].  The  main  differences  are  that  we  are  only 
interested  in  evaluation  of  circuits  of  multiplicative  depth  one,  we  are  interested  in  performing 
operations  in  parallel  on  multiple  data  items,  and  we  require  a  distributed  decryption  procedure. 
In  this  section  we  detail  the  scheme  and  the  distributed  decryption  procedure;  in  Appendix  D  we 
discuss  security  of  the  scheme,  and  present  some  sample  parameter  sizes  and  performance  figures. 

ParamGen(l'^,  M):  Recall  the  message  space  is  given  by  M  =  for  two  integers  k  and  s,  and  a 

prime  p,  i.e.  the  message  space  is  s  copies  of  the  finite  field  F^k.  To  map  this  to  our  scheme  below, 
one  first  finds  a  cyclotomic  polynomial  F(X)  :=  of  degree  N  :=  4>{m),  where  N  is  lower 

bounded  by  some  function  of  the  security  parameter  k.  The  polynomial  F{X)  needs  to  be  such 
that  modulo  p  the  polynomial  F{X)  factors  into  V  irreducible  factors  of  degree  k'  where  I'  >  s  and 
k  divides  k' .  We  then  define  an  algebra  Ap  as  Ap  :=  ¥p[X\/ F{X)  and  we  have  an  embedding  of  M 
into  Ap,  (j)  :  M  ^  Ap.  By  “lifting”  modulo  p  we  see  that  there  is  a  natural  inclusion  l  :  Ap  ^  Z^, 
which  maps  the  polynomial  of  degree  less  than  N  with  coefficients  in  Fp  into  the  integer  vector 
of  length  N  with  coefficients  in  the  range  {—pl2, . . .  ,p/2].  The  encode  function  is  then  defined  by 
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t(0(m))  for  m  G  (F^fe)^,  with  decode  defined  by  </>“^(x  (mod  p))  for  x  G  Z^.  It  is  clear,  by  choice 
of  the  natural  inclusion  i,  that  ||encode(m)  ||oo  <  p/2  =  r. 

We  pick  a  large  integer  q,  whose  size  we  will  determine  later,  and  defined  Ag  :=  (Z/gZ)[X]/F(X), 
i.e.  the  ring  of  integer  polynomials  modulo  reduction  by  F(X)  and  q.  In  practice  we  consider  the 
image  of  encode  to  lie  in  Ag,  and  thus  we  abuse  notation,  by  writing  addition  and  multiplication 
in  Ag  by  +  and  •.  Note,  that  this  means  that  applying  decode  to  elements  obtained  from  encode 
followed  by  a  series  of  arithmetic  operations  may  not  result  in  the  value  in  M  which  one  would 
expect.  This  corresponds  to  where  our  scheme  can  only  evaluate  circuits  from  a  given  set  C. 

The  ciphertext  space  G  is  defined  to  be  yl^,  with  addition  ffl  defined  componentwise.  The 
multiplicative  operator  Kl  is  defined  as  follows 


(ao,ai,0)  Kl  (bo,bi,0)  :=  (ao  •  bo,ai  •  bo  +  ao  •  bi,  -ai  •  bi), 

i.e.  multiplication  is  only  defined  on  elements  whose  third  coefficient  is  zero. 

We  define  as  follows:  The  discrete  Gaussian  D^n with  Gaussian  parameter  s,  is  defined  to 
be  the  random  variable  on  Z^  (centered  around  the  origin)  obtained  from  sampling  x  G  with 
probability  proportional  to  exp(— vr  •  ||x||2/s^),  and  then  rounding  the  result  to  the  nearest  lattice 
point  and  reducing  it  modulo  q.  Note,  sampling  from  the  distribution  with  probability  density 
function  proportional  to  exp(— vr  •  ||x||2/s^),  means  using  a  normal  variate  with  mean  zero,  and 
standard  deviation  r  :=  s/\/2b^.  In  our  concrete  scheme  we  set  d  :=  3  ■  N  and  define  to  be 
the  distribution  defined  by  {Dj^n  g)^.  Note,  that  in  the  notation  the  implicit  dependence  on  q 
has  been  suppressed  to  ease  readability.  The  determining  of  q  and  r  as  functions  of  all  the  other 
parameters,  we  leave  until  we  discuss  security  of  the  scheme. 

KeyGen():  We  will  use  the  public  key  version  of  the  Brakerski-Vaikuntanathan  scheme  [7].  Given 
the  above  set  up,  key  generation  proceeds  as  follows:  First  one  samples  elements  a  <—  and 
s,e  <—  g.  Then  treating  s  and  e  as  elements  of  Ag  one  computes  b  <—  (a  •  s)  +  (p  •  e).  The 

public  and  private  key  are  then  set  to  be  pk  <—  (a,  b)  and  sk  <—  s. 

EnCpk(x,  r):  Given  a  message  x  <—  encode(m)  where  m  G  M,  and  r  G  □((,  we  proceed  as  follows:  The 
element  r  is  parsed  as  (u,  v,  w)  G  (Z-^)^.  Then  the  encryptor  computes  Cq  <—  (b  •  v)  +  (p  •  w)  +  x 
and  Cl  ^  (a  •  v)  +  (p  •  u).  Finally  returning  the  ciphertext  (cq,  ci,  0). 

DeCsk(c):  Given  a  secret  key  sk  =  s  and  a  ciphertext  c  =  (co,ci,C2)  this  algorithm  computes 
the  element  in  Ag  satisfying  t  =  cq  —  (s  •  ci)  —  (s  •  s  •  C2).  On  reduction  by  q  the  value  of  ||t||oo 
will  be  bounded  by  a  relatively  small  constant  B]  assuming  of  course  that  the  “noise”  within  a 
ciphertext  has  not  grown  too  large.  We  shall  refer  to  the  value  t  mod  q  as  the  “noise” ,  despite  it 
also  containing  the  message  to  be  decrypted.  At  this  point  the  decryptor  simply  reduces  t  modulo 
p  to  obtain  the  desired  plaintext  in  Ag,  which  can  then  be  decoded  via  the  decode  algorithm. 
KeyGen*():  This  simply  samples  a,  b  <—  Ag  and  returns  pk  :=  (a,  b). 

Following  the  discussion  in  [7]  we  see  that  with  this  fixed  ciphertext  space,  our  scheme  is  some¬ 
what  homomorphic.  It  can  support  a  relatively  large  number  of  addition  operations,  and  a  single 
multiplication. 

Distributed  Version  We  now  extend  the  scheme  above  to  enable  distributed  decryption.  We  first  set 
up  the  distributed  keys  as  follows.  After  invoking  the  functionality  for  key  generation,  each  player 
obtains  a  share  skj  =  (sj^i,Sj^2))  these  are  chosen  uniformly  such  that  the  master  secret  is  written 
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as 


s  —  Si^i  +  •  •  •  + 


s  •  S  —  Si  2  +  •  •  •  +  S„^2- 


As  remarked  earlier  this  one-time  setup  procedure  can  be  accomplished  by  standard  UC-secure 
multiparty  computation  protocols  such  as  that  described  in  [5] .  The  following  theorem  is  proved  in 
Appendix  A. 6.  It  depends  on  the  constant  B  defined  above.  In  Appendix  D  we  compute  the  value 
of  B  when  the  input  ciphertext  is  {Bpiain,  Brandi  C')-admissible,  and  show  how  to  choose  parameters 
for  the  cryptosystem  such  that  the  required  bound  on  B  is  satisfied. 

Theorem  4.  In  the  BkeyGen -hybrid  model,  the  protocol  IIddec  (Figure  8)  implements  .TkeyGenDec 
with  statistieal  seeurity  against  any  statie  aetive  adversary  eorrupting  up  to  n  —  1  parties  if  B  + 
2=^"  -B  <ql2. 


Protocol  IIddec 


Initialize:  Each  party  Pi  on  being  given  the  ciphertext  c  =  (co,  ci,  C2),  and  an  upper  bound  B  on  the  infinity 
norm  of  t  above,  computes 


f  Co  -  (si,i  •  Cl)  -  (si,2  •  C2)  if  i  =  1 
\  -(Si,i  •  Cl)  -  (Si,2  •  C2)  if  i  yf  1 


and  sets  t;  <—  Vi  -f  p  •  ri  where  ri  is  a  random  element  with  infinity  norm  bounded  by  2“'^  •  B/{n  ■  p). 
Public  Decryption:  All  the  players  are  supposed  to  learn  the  message. 

—  Each  party  Pi  broadcasts  ti 

—  All  players  compute  t'  ^  ti  +  •  •  •  + 1„  and  obtain  a  message  m'  e-  decode(t'  mod  p). 

Private  Decryption:  Only  player  Pj  is  supposed  to  learn  the  message. 

—  Each  party  Pi  sends  ti  to  Pj 

—  Pj  computes  t'  ^  ti  +  •  •  •  +  t„  and  obtain  a  message  m'  <—  decode(t'  mod  p). 


Fig.  8.  The  distributed  decryption  protocol. 
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A  Proofs 

A.l  Zero-Knowledge  Proof 

Construction  of  the  Protocol.  We  will  give  two  versions  of  the  protocol.  The  first  is  a  standard 
3-move  protocol,  the  second  uses  an  “abort”  technique  to  optimize  the  parameter  values,  this  one 
is  best  suited  for  use  with  the  Fiat-Shamir  heuristic,  and  may  be  the  best  option  for  a  practical 
implement  ation . 

For  the  protocol,  we  will  need  that  r  =  p/2,  so  that  ||encode(m)  ||oo  <  t  =  p/2.  This  means  that 
each  entry  in  encode(m)  corresponds  to  a  uniquely  determined  residue  mod  p  (or  equivalently  an 
element  in  Zp)  and  conversely  each  such  residue  is  uniquely  determined  by  m.  We  did  not  ask  for 
this  in  the  abstract  description,  but  the  concrete  instantiation  satishes  this.  Note  that  one  problem 
we  need  to  address  in  the  protocol  is  that  not  all  vectors  in  the  input  domain  of  decode  will  give 
us  results  in  Fpk.  However,  if  an  input  is  equivalent  mod  p  to  encode(m)  for  some  m  then  this  is 
indeed  the  case,  since  then  decode  will  return  m.  Therefore  the  veriher  explicitly  checks  whether 
the  encodings  the  prover  sends  him  decode  to  legal  values,  this  will  imply  that  the  ciphertexts  in 
question  also  decode  to  legal  values. 

We  let  R  denote  the  matrix  in  whose  ith  row  is  r*.  It  makes  use  of  a  matrix  Mg  defined 

as  follows.  Let  V  :=  2  ■  sec  —  1.  For  e  G  {0,  we  define  Mg  G  i^Vxsec  matrix  whose 

(i,  A:)-th  entry  is  given  by  ej_fc_|_i,  for  l<i  —  A:+l<  sec  and  0  otherwise. 


Protocol  IIzKPoPK 

—  For  i  =  1,...,V,  the  prover  sets  yi  <—  and  ^  Z'*,  such  that  |]yi||tx>  <  N  ■  t  ■  sec^  •  and 

Ijsilloo  <  d  ■  p  ■  sec^  •  For  yi,  this  is  done  as  follows:  choose  a  random  message  nii  G  (Ypk)"  and  set 

yi  =  encode(mi)  +  Ui,  where  each  entry  in  Ui  is  a  multiple  of  p,  chosen  nniformly  at  random,  subject  to 
Ijyilicx;  <  N  ■  T  ■  sec^  •  2''“’^“^.  If  diag  is  set  to  true,  then  the  rtii  are  chosen  to  be  diagonal  elements. 

—  The  prover  computes  Ui  <—  EnCpk(yi, Si),  for  i  —  1, . . .  ,V ,  and  defines  S  G  Z^^'^  to  be  the  matrix  whose  ith 
row  is  Si  and  sets  y  ^  (yi, . . .  ,yv),  a  ^  (ui, . . . ,  ay). 

—  The  prover  sends  a  to  the  verifier. 

—  The  verifier  selects  e  G  {0,  and  sends  it  to  the  prover. 

—  The  prover  sets  z  ^  (zi, . . . ,  zv),  such  that  z^  =  y^  +  Me  •  x^,  and  T  =  S  +  Me  ■  R.  The  prover  sends  (z,  T) 
to  the  verifier. 

—  The  verifier  computes  di  *—  EnCpk(zi, ti),  for  i  =  1,...,V,  where  ti  is  the  ith  row  of  T  and  sets  d  <— 
{di,...,dv). 

—  The  verifier  checks  that  decode(zi)  G  and  whether  the  following  three  conditions  hold;  he  rejects  if  not 

d"'"  =  a"''  EB  Kl  c"''^  ,  |lzi||cx>  <  N  ■  t  ■  sec^  •  2“'“'^“^,  ||ti||oo  <  d  ■  p  ■  seC  ■  2"“'^“^. 

—  If  diag  is  set  to  true  the  verifier  also  checks  whether  decode(zi)  is  a  diagonal  element,  and  rejects  if  it  is  not. 

Fig.  9.  The  ZKPoPK  Protocol,  interactive  version. 
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Theorem  5.  The  protocol  Hzk'PoVK  (Appendix  A.  1,  Figure  9)  is  an  honest-verifier  zero-knowledge 
proof  of  knowledge  for  the  relation  RpoPk- 

Proof  (Theorem  5). 

Completeness:  Assume  the  prover  is  honest.  For  i  =  the  verifier  checks  if  EnCpk(zj,tj) 

equals  a*  ffl  Me,i  •  c"'',  since  Me,i  is  a  scalar  matrix  we  write  multiplication  with  •  as  opposed  to  Kl. 
The  check  passes  because  of  the  following  relation: 

Oi  ffl  •  c'*’^  =  Encpk(yi,si)  ■  Ck) 

=  EnCpk(yi,Si)  {Me,i,k  ■  EnCpk(xfc,  r^)) 

(sec  sec  \ 

y*  “E  ^  ^  ■  ^ki  “E  ^  ^  Al^  i  k  ■  I 

fc=l  k=l  J 

=  Encpk  (^y*  +  Me,i  •  x’’’,  s*  +  Me,*  •  r’’’^  =  EnCpk(zi,  t*). 

Moreover,  given  that  z*  =  y*  +  Me,*  •  x"*"  and  that  all  ciphertexts  in  c  are  (r,  /9)-ciphertexts,  we  get 
that  each  single  coordinate  in  Me,*  •  x"*"  is  numerically  at  most  sec  •  r.  Each  coordinate  of  y*  was 
chosen  from  an  interval  that  is  a  factor  N  •  sec  •  larger.  By  a  union  bound  bound  over  the 

N  ■  sec  coordinates  involved,  each  coordinate  in  z*  fails  to  be  in  the  required  range  with  probability 
exponentially  small  in  sec.  A  similar  argument  shows  that  the  check  ||t*||oo  also  fails  with  negligible 
probability.  Finally,  each  y*  was  constructed  to  be  congruent  mod  p  to  the  encoding  of  a  value  in 
F^j,.  Since  this  is  also  the  case  for  the  x*’s  if  the  prover  is  honest,  the  same  is  true  for  the  z*’s,  and 
they  therefore  decode  to  a  value  in  If  diag  was  set  to  true,  all  x*,  y*  contain  diagonal  plaintexts, 
and  then  the  same  is  true  for  the  z*. 

Soundness:  We  consider  a  prover  making  a  veriher  accept  both  (x,  a,  e,  (z,  T))  and  (x,  a,  e',  (z',  T')) 
with  e  7E  e'.  Since  both  checks  d"*"  =  a"*"  ffl  (Me  •  c"'')  and  d'"'"  =  a"'"  ffl  (Mg'  •  c"'')  passed,  one  can 
subtract  the  two  equalities  and  obtain 

(Me  -  Me/)  =  (dEd')"^  (1) 

In  order  to  find  x  and  R  such  that  Ck  =  EnCpk(xfc,  r^)  for  /?  =  !,...,  sec,  we  first  solve  (1)  as  a  linear 
system  in  c.  Let  j  be  the  highest  index  such  that  ej  e'-.  The  sec  x  sec  submatrix  of  Me  —  Me', 
consisting  of  the  rows  of  Me  —  Me'  between  j  and  f  +  sec  —  1  both  included,  is  upper  triangular 
with  entries  in  {—1,0, 1}  and  its  diagonal  consists  of  the  non-zero  value  ej  —  e'-  (so  it  is  possible 
to  find  a  solution  for  c).  Since  the  verifier  has  values  z*,t*,z',t'  such  that  d*  =  EnCpk(z*,t*)  and 
d[  =  EnCpk(z',  t'),  and  given  that  c*  =  EnCpk(x*,  r*),  it  is  possible  to  directly  solve  the  linear  system 
in  X  and  R  (since  the  cryptosystem  is  additively  homomorphic),  from  the  bottom  equation  to  the 
one  “in  the  middle”  with  index  sec/2.  Since  ||z*||oo,  ||z(||oo  <  N  -t  ■  sec^  •  and  ||t*||oo,  ||til|oo  < 

d  -  p  ■  sec^  •  we  conclude  that  Csec-*  is  a  (s  •  r  •  sec^  •  2'^^®'^+*,  d  ■  p  ■  sec^  •  2'^^®'^+*)-ciphertext 

(by  induction  on  i).  To  solve  for  ci, . .  .Csec/2)  consider  the  lowest  index  j  such  that  ej  e'-, 
construct  an  lower  triangular  matrix  in  a  similar  way  as  above,  and  solve  from  the  first  equation 
downwards.  We  conclude  that  c  contains  {N  ■  r  •  sec^  •  d  -  p-  sec^  •  2(l/^+'^)^ec^_(jjpJ^gj.|;gxts. 

We  note  that  since  the  verifier  accepted,  each  z*  has  small  norm  and  decodes  to  a  value  in 
(Fpfc)®.  Since  we  can  write  x*  as  a  linear  combination  of  the  z*,  it  follows  from  correctness  of  the 
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cryptosystem  that  the  x*  also  decode  to  values  in  (F^fe)^.  Finally,  if  diag  was  set  to  true,  the  verifier 
only  accepts  if  all  Zj  decode  to  diagonal  values.  Again,  since  we  can  write  x*  as  a  linear  combination 
of  the  Zj,  the  x*  also  decode  to  diagonal  values. 


Zero-Knowledge:  We  give  an  honest-verifier  simulator  for  the  protocol  that  outputs  accepting  con¬ 
versations.  In  order  to  simulate  one  repetition,  the  simulator  samples  e  G  {0, uniformly  and 
z,  T  uniformly  with  the  constrain  that  d  contains  random  ciphertexts  satisfying  the  verifiers  check, 
i.e.,  Zj,  tj  are  uniform,  subject  to  ||zj||oo  <  A'-r -sec^ ||tj||oo  <  d-p- sec^- where  more¬ 
over  Zj  is  generated  as  encode(mj)-|-Uj  where  m*  is  a  random  plaintext  (diagonal  if  diag  is  set  to  true) 
and  Uj  contains  multiples  of  p  that  are  uniformly  random,  subject  to  ||zj||oo  <  N  ■  t  ■  sec^  • 

Finally,  a  is  computed  as  a"*"  <—  d"*"  B  (Me  •  c"'').  In  the  real  conversation,  the  provers  choice  of 
values  in  Zj  and  tj  are  statistically  close  to  the  distribution  used  by  the  simulator.  This  is  because 
the  prover  uses  the  same  method  to  generate  these  values,  except  that  he  adds  in  some  vectors 
of  exponentially  smaller  norm  which  leads  to  a  statistically  close  distribution.  Since  e  has  the 
correct  distribution  and  a  follows  deterministically  from  the  last  two  messages,  the  simulation  is 
statistically  indistinguishable. 

□ 


We  now  give  a  protocol  that  leads  to  smaller  values  of  the  parameters  and  hence  also  allows 
better  parameters  for  the  underlying  cryptoystem.  This  version,  however,  is  better  suited  for  use 
with  the  Fiat-Shamir  heuristic.  The  idea  is  to  let  the  prover  choose  his  randomness  in  a  smaller 
interval,  and  abort  if  the  last  message  would  reveal  too  much  information.  This  is  an  idea  from 
[21].  When  using  the  Fiat-Shamir  heuristic,  this  is  not  a  problem  as  he  prover  only  needs  to  show  a 
successful  attempt  to  he  verifier.  We  let  hhe  a  suitable  hash  function  that  outputs  sec-bit  strings. 


Protocol  Ezkpopk 

—  For  i  =  1, . . .  ,V,  the  prover  generates  yi  <—  and  Si  ^  Z'*,  such  that  ||yi||oo  <  128  ■  N  ■  t  ■  sec^  and 
II Si II oo  <  128  ■  d  ■  p  ■  sec? .  For  yi,  this  is  done  as  follows:  choose  a  random  message  mi  £  i^pk)^  and  set 
yi  =  encode(mi)  -f  Ui,  where  each  entry  in  Ui  is  a  multiple  of  p,  chosen  uniformly  at  random,  subject  to 
llyilloo  <  128  ■  N  ■  T  ■  sec? .  If  diag  is  set  to  true  then  the  mi  are  additionally  chosen  to  be  diagonal  elements. 

—  The  prover  computes  Ui  <—  EnCpk(yi, Si),  for  i  =  1, . . .  ,V ,  and  dehnes  S  €  Z^^'^  to  be  the  matrix  whose  ith 
row  is  Si  and  sets  y  ^  (yi, . . .  ,yv),  a  ^  (ui, . . . ,  ay). 

—  The  prover  sends  a  to  the  verifier. 

—  The  prover  computes  e  =  h(a,  c). 

—  The  prover  sets  z  ^  (zi, . . . ,  zy),  such  that  z^  =  y^  -f  Me  ■  x^,  and  T  =  S  -f  Me  ■  R-  Let  ti  be  the  ith  row 
of  T.  If  for  any  i,  it  is  the  case  that  ||zi||oo  >  128  ■  N  ■  t  ■  sec^  —  r  •  sec  or  ||ti||oo  >  128  ■  d  ■  p  ■  sec^  —  p  ■  sec, 
the  prover  aborts  and  the  protocol  is  restarted.  Otherwise  the  prover  sends  (a,  z,  T)  to  the  veriher. 

—  The  veriher  computes  e  =  /i(a,  c),di  ^  EnCpk(zi,  ti),  for  i  =  1, . . .  ,V ,  where  ti  is  the  ith  row  of  T  and  sets 
d  ^  (di, . .  .,dv). 

—  The  veriher  checks  decode(zi)  £  and  whether  the  following  three  conditions  hold 

d"''  =  a"'"  EB  (^Me  Kl  c"''^  ,  |lzi||oo  <  128  ■  N  ■  t  ■  sec^,  ||ti||oo  <  128  ■  d  -  p  ■  sec^. 

If  diag  is  set  to  true  the  veriher  also  checks  whether  decode(zi)  is  a  diagonal  element,  and  rejects  if  it  is  not. 

Fig.  10.  The  ZKPoPK  Protocol,  version  for  Fiat-Shamir  heuristic. 
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We  claim  that  the  Fiat-Shamir  based  protocol  is  a  proof  of  knowledge  for  the  relation  in  question 
in  the  random  oracle  model.  In  this  case,  however,  we  can  guarantee  that  the  adversarially  generated 
ciphertexts  are  {N  ■  r  •  sec^  •  d  ■  p  ■  sec^  •  ciphertexts. 


Completeness:  Assume  the  prover  is  honest.  Note  first  that  each  y*  was  constructed  to  be  congruent 
mod  p  to  the  encoding  of  a  value  in  (F^k)®.  Since  this  is  also  the  case  for  the  Xj’s  if  the  prover  is 
honest,  the  same  is  true  for  the  Zj’s,  and  they  therefore  always  decode  to  a  value  in  (F^k)®.  If  diag 
was  set  to  true,  all  Xj,yj  contain  diagonal  plaintexts,  and  then  the  same  is  true  for  the  Zj. 

Next,  for  i  =  1, . . . ,  F  the  verifier  checks  if  EnCpk(zi,tj)  equals  since  Me,*  is  a  scalar 

matrix  we  write  multiplication  with  •  as  opposed  to  Kl.  The  check  passes  because  of  the  following 
relation: 


a,- 


Me 


EnCpk(y*,Sj)  ■  cu) 

Encpk(y*,s*)  {Me,i,k  ■  EnCpk(xfc,  rfc)) 

(sec  sec  \ 

y*  T  ^  ^  ■  ^ki  T  ^  ^  Me,*,fc  ‘  ffc  | 

k=l  k=l  } 

EnCpk  (^y*  +  Me,*  •  x’’’,  s*  +  Me,*  •  =  EnCpk(z*,  t*). 


Moreover,  given  that  z*  =  y*  +  Me,*  •  x"*"  and  that  all  ciphertexts  in  c  are  (r,  /9)-ciphertexts,  we  get 
that  each  single  coordinate  in  Me,*  •  x"*"  is  numerically  at  most  sec  •  r.  Each  coordinate  of  y*  was 
chosen  from  an  interval  that  is  a  factor  128  •  N  •  sec  larger.  Therefore  each  coordinate  in  z*  fails 
to  be  in  the  required  range  with  probability  1/(128  •  N  •  sec).  Note  that  this  probability  does  not 
depend  on  the  concrete  values  of  the  coordinates  in  Me,*  •  x"*",  only  on  the  bound  on  the  numeric 
value. 

By  a  union  bound  over  the  N  coordinates  of  z*  we  get  that  ||z*||oo  <  128  ■  N  -t  ■  sec^  —  r  •  sec  fails 
with  probability  at  most  1/(128 -sec),  and  by  a  final  union  bound  over  the  2  sec  —1  ciphtertexts  that 
all  checks  on  the  z*’s  are  ok  except  with  probability  at  most  1/64.  A  similar  argument  shows  that 
the  check  ||t*||oo  <  128  ■  d  -  p  •  sec^  —  p  •  sec  fails  also  with  probability  at  most  1/64.  The  conclusion 
is  that  the  prover  will  abort  with  probability  at  most  1/32,  so  we  expect  to  only  have  to  repeat  the 
protocol  once  to  have  success. 


Soundness:  By  a  standard  argument,  a  prover  who  can  efficiently  produce  a  valid  proof  is  able  to 
produce  (a:,a,  e,  (z,T))  and  (x,a,  e',  (z/T'))  with  e  /  e'  that  the  verifier  would  accept.  Since  both 
checks  d"*"  =  a"*"  ffl  (Me  •  c"'')  and  d'"*"  =  a"*"  ffl  (Me'  •  c"'')  passed,  one  can  subtract  the  two  equalities 
and  obtain 

(Me  -  Me')  =  (dEd')"^  (2) 

In  order  to  find  x  and  R  such  that  Ck  =  EnCpk(xfc,  r^,)  for  k  =  1, . . . ,  sec,  we  first  solve  (2)  as  a  linear 
system  in  c.  Let  j  be  the  highest  index  such  that  ej  ^  e'-.  The  sec  x  sec  submatrix  of  Me  —  Me', 
consisting  of  the  rows  of  Me  —  Me'  between  j  and  j  +  sec  —  1  both  included,  is  upper  triangular 
with  entries  in  {—1,0, 1}  and  its  diagonal  consists  of  the  non-zero  value  ej  —  e'-  (so  it  is  possible 
to  find  a  solution  for  c).  Since  the  verifier  has  values  z*,t*,z',t'  such  that  d*  =  EnCpk(z*,t*)  and 
d'  =  EnCpk(z',  t'),  and  given  that  c*  =  EnCpk(x*,  r*),  it  is  possible  to  directly  solve  the  linear  system 
in  X  and  R  (since  the  cryptosystem  is  additively  homomorphic),  from  the  bottom  equation  to  the 
one  “in  the  middle”  with  index  sec/2. 
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Since  ||zj||oo,  ||z'||oo  <  128  ■  N  •  t  •  sec^  and  ||tj||oo,  ||til|oo  <  128  ■  d  ■  p  ■  sec^,  we  conclude  that  Csec-i 
must  be  a  (256  ■  N  ■  t  ■  2^  ■  sec^,256  ■  d  ■  p  ■  2^  ■  sec^)-ciphertext  (by  induction  on  i).  To  solve  for 
Cl , . . .  Csec/2 )  consider  the  lowest  index  j  such  that  ej  /  e'- ,  construct  an  lower  triangular  matrix 
in  a  similar  way  as  above,  and  solve  from  the  first  equation  downwards.  We  conclude  that  c  contains 
{N  ■  T  ■  sec^  •  d  ■  p  ■  sec^  •  2^“/^'''®)-ciphertexts. 

We  note  that  since  the  verifier  accepted,  each  Zj  has  small  norm  and  decodes  to  a  value  in 
(Fpfe)®.  Since  we  can  write  x*  as  a  linear  combination  of  the  Zj,  it  follows  from  correctness  of  the 
cryptosystem  that  the  x*  also  decode  to  values  (F^fe)^.  Finally,  if  diag  was  set  to  true,  the  verifier 
only  accepts  if  all  Zj  decode  to  diagonal  values.  Again,  since  we  can  write  x*  as  a  linear  combination 
of  the  Zj,  the  x*  also  decode  to  diagonal  values. 

Zero-Knowledge:  We  give  an  honest-verifier  simulator  for  the  protocol  that  outputs  an  accepting 
conversation  (that  does  not  abort). 

In  order  to  simulate  one  repetition,  the  simulator  samples  e  G  {0,1}^®'^  uniformly  and  z,T 
uniformly  with  the  constrain  that  d  contains  random  {8  -  N  -  t  •  sec^  —  r  •  sec,  8  -  d  -  p  -  sec^  —  p  •  sec)- 
ciphertexts.  where  moreover  Zj  is  generated  as  encode(mj)  -|-  Uj  where  m*  is  a  random  plaintext 
(a  diagonal  one  if  diag  is  set  to  true)  and  Uj  contains  multiples  of  p  that  are  uniformly  random, 
subject  to  ||zi||oo  <  8A^  •  r  •  sec^  —  r  •  sec.  Finally,  a  is  computed  as  a"*"  ^  d"*"  B  (Me  •  c"'').  Define  the 
random  oracle  to  output  e  on  input  a,  c,  output  (a,  e,  (z,T))  and  stop. 

We  argue  that  this  simulation  is  perfect:  The  distribution  of  a  simulated  e  is  the  same  as  a  real 
one.  Also,  it  is  straightforward  to  see  that  in  a  real  conversation,  given  that  the  prover  does  not 
abort,  the  vectors  Zj,tj  will  be  uniformly  random,  subject  to  ||zj||oo  <  8  ■  s  ■  t  ■  sec^  —  r  •  sec  and 
||ti||oo  <  8  ■  d  ■  p  ■  sec^  —  p  •  sec.  So  the  simulator  chooses  Zj,tj  with  exactly  the  right  distribution. 
Since  the  value  of  a  follows  deterministically  from  the  e,  Zj,tj,  we  have  what  we  wanted. 

Doing  without  random  oraeles.  The  above  protocol  can  also  be  executed  without  using  the  Fiat- 
Shamir  heuristic.  In  this  case,  the  prover  will  start  sec/5  instances  of  the  protocol,  computing 
ai, . . . ,  ajec/s-  We  choose  this  number  of  instance  because  it  will  ensure  that  the  prover  fails  on  all 
of  them  with  probability  only  (1/32)^®'^/^  =  2“^®'^.  The  prover  commits  to  all  these  values,  which 
can  be  done,  for  instance,  with  a  Merkle  hash  tree,  in  which  case  the  commitment  will  be  very 
short,  and  any  of  a’s  can  be  opened  by  sending  a  piece  of  information  that  is  only  logarithmic  in 
sec. 

The  verifier  selects  e,  the  prover  finds  an  instance  where  he  would  not  abort  the  protocol  with 
this  e,  opens  the  corresponding  a  and  completes  that  instance. 

This  is  complete  and  zero-knowledge  by  the  same  argument  as  above  plus  the  hiding  property 
of  the  commitment  scheme  used.  Soundness  follows  from  the  fact  that  if  the  prover  succeeds  with 
probability  significantly  greater  that  2“^®'^  •  sec/5  he  must  be  able  to  answer  different  challenges 
correctly  for  some  fixed  instance  out  of  the  sec/5  we  have.  Such  answers  can  be  extracted  by 
rewinding,  and  then  the  rest  of  the  argument  is  the  same  as  above. 

A.2  The  UC  Model 

In  the  following  sections,  we  show  that  the  online  and  preprocessing  phases  of  our  protocol  are 
secure  in  the  UC  model.  We  briefly  recall  how  this  model  works:  we  will  use  the  variant  where  there 
is  only  one  adversarial  entity,  the  environment  Z.  The  environment  chooses  inputs  for  the  honest 
players  and  gets  their  outputs  when  the  protocol  is  done.  It  also  does  an  attack  on  the  protocol 
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which  is  our  case  means  that  it  corrupts  up  to  n  —  1  of  the  players  and  takes  control  over  their 
actions.  When  Z  stops,  it  outputs  a  bit.  This  process  where  Z  interacts  with  the  real  players  and 
protocol  is  called  the  real  process. 

To  define  what  it  means  that  the  protocol  implements  functionality  T  securely  we  assume  there 
exists  a  simulator  S  that  interacts  with  both  T  and  Z.  Towards  J-,  it  chooses  inputs  for  the  corrupt 
players  and  will  get  their  outputs.  Towards  Z^  it  must  simulate  a  view  of  the  protocol  that  looks 
like  what  Z  would  see  in  a  real  attack.  This  process  is  called  the  ideal  process,  and  here  T  supplies 
Z  with  the  i/o  interface  of  honest  players.  We  say  that  the  protocol  implements  T  securely  if  Z 
outputs  1  with  essentially  the  same  probability  in  the  real  as  in  the  ideal  process.  We  speak  of 
computational  security  if  Z  is  assumed  to  be  poly-time  bounded  and  of  statistical  security  if  Z  is 
unbounded. 

A. 3  Online  Phase 

On  generating  the  e*  ’s  Before  proving  the  online  protocol  UC  secure,  we  compute  the  probability 
of  getting  away  with  cheating  in  step  4  of  ‘Output’  and  how  this  depends  on  the  way  we  generate 
the  Cj’s. 

For  this  purpose  we  design  the  following  security  game: 

1.  The  challenger  generates  the  secret  key  a  and  MACs  7^  <—  anii  and  sends  messages  mi, . . . ,  mx 
to  the  adversary. 

2.  The  adversary  sends  back  messages  m'l, . . . ,  m^. 

3.  The  challenger  generates  random  values  ei, . . . ,  ct  <—  Fp*  and  sends  them  to  the  adversary. 

4.  The  adversary  provides  an  error  A. 

5.  Set  m  <—  ^  Y^i=o  Now,  the  challenger  checks  that  am  =  ^  +  A 

The  adversary  wins  the  game  if  there  is  an  i  for  which  m!^  7^  m,  and  the  final  check  goes  through. 

It  is  not  difficult  to  see  that  this  game  indeed  models  ‘Output’(up  to  step  4):  The  second  step 
in  the  game  where  the  adversary  sends  the  m'’s  models  the  fact  that  corrupted  players  can  choose 
to  lie  about  their  shares  of  values  opened  during  the  protocol  execution.  A  models  the  fact  that 
the  adversary  is  allowed  to  introduce  errors  on  the  macs  when  data  are  sent  to  .Tprep  in  the  initial 
part  of  the  protocol  and  may  also  modify  the  shares  of  macs  held  by  corrupt  players.  Finally,  since 
a,  7  are  secret  shared  in  the  protocol,  the  adversary  has  no  information  on  a,  7  ahead  of  time  in 
the  protocol,  just  as  in  the  security  game. 

Now,  let  us  look  at  the  probability  of  winning  the  game  if  the  e^’s  are  randomly  chosen.  If 
the  check  goes  through,  we  have  that  the  following  equality  holds:  ~  First 

we  consider  the  case  where  —  ruj)  /  0,  so  a  =  AjJ2-^fyei{m'^  —  mi).  This  implies 

that  being  able  to  pass  the  check  is  equivalent  to  guessing  a.  However,  since  the  adversary  has  no 
information  about  a,  this  happens  with  probability  only  l/|Fpfe|.  So  what  is  left  is  to  argue  that 
~  ''rr*)  =  0  also  happens  with  very  low  probability.  This  can  be  seen  as  follows.  We 
define  m  :=  (m-  -  m*)  and  ^  :=  (/ii, . . . , ^t), e  :=  (ei, . . .  ,er).  Now  /^(e)  :=  e  ■  fi  = 
defines  a  linear  mapping,  which  is  not  the  0-mapping  since  at  least  one  7^  0.  From  linear  algebra 
we  then  have  the  rank-nullity  theorem  telling  us  that  dim(ker(/^))  =  T  —  1.  Also  since  e  is  random 
and  the  adversary  does  not  know  e  when  choosing  the  m^’s,  the  probability  of  e  G  ker(/^)  is 
iF^jT^I/lFjfcl  =  l/|Fpfc|.  Summing  up,  the  total  probability  of  winning  the  game  is  at  most  2/|Fpfc|. 

Since  choosing  the  e^’s  uniformly  would  require  an  expensive  coin- flip  protocol,  we  use  a  different 
way  to  generate  them  in  the  protocol:  namely  ei  is  chosen  at  random  and  for  i  >  1,  <—  ej. 
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This  has  the  advantage  of  adding  only  a  constant  number  of  multiplications  in  Fp*  for  a  secure 
multiplication.  On  the  security  side,  we  still  want  that  ~  ^  should  happen  with  small 

probability.  Viewing  as  a  polynomial  of  degree  T,  we  know  it  has  at  most  T  roots,  so  we  have 
to  make  sure  we  have  an  upper  bound  on  T  such  that  ei  is  chosen  from  a  field  big  enough  for  T /p^ 
to  be  negligible. 

An  alternative  approach  would  be  to  use  a  pseudorandom  generator  G.  We  would  then  have 
shared  some  random  seed  (s).  By  opening  (s)  and  feeding  it  to  G  we  can  generate  T  pseudorandom 
elements.  In  the  protocol,  the  parties  would  commit  to  their  share  of  the  MAC  on  s,  and  when  a 
becomes  public,  the  MAC  would  be  checked.  If  it  is  OK,  the  protocol  would  go  on  with  the  rest 
of  the  checks.  With  respect  to  cheating  the  argument  is  basically  the  same;  If  an  adversary  A  has 
a  significant  probability  of  choosing  m'’s  such  that  ~  O’  then  the  G  is  a  bad 

pseudorandom  generator,  or  in  other  words,  we  can  use  A  to  break  G.  With  this  way  of  generating 
the  Cj’s,  we  increase  the  complexity  for  one  secure  multiplication  by  whatever  G  needs  to  generate 
one  pseudorandom  element. 

Proof  (Theorem  1).  We  construct  a  simulator  5ampc  such  that  a  poly-time  environment  Z  cannot 
distinguish  between  the  real  protocol  system  and  the  ideal.  We  assume  here  static,  active  corrup¬ 
tion.  The  simulator  runs  a  copy  of  the  protocol  Honline  and  simulates  the  ideal  functionalities  for 
preprocessing  and  commitment.  It  relays  messages  between  parties/.TpREP  and  such  that  Z  will 
see  the  same  interface  as  when  interacting  with  a  real  protocol.  The  specification  of  the  simulator 
^AMPC  is  presented  in  Figure  11. 

Simulator  5ampc 

Initialize:  The  simulator  creates  the  desired  number  of  triples  by  doing  the  steps  in  Tprep-  Note  that  here  the 
simnlator  will  read  all  data  of  the  corrupted  parties  specified  to  the  copy  of  Tprep. 

Rand:  The  simulator  runs  the  copy  protocol  honestly  and  calls  rand  on  the  ideal  functionality  Tampc- 
Input:  If  Pi  is  not  corrupted  the  copy  is  run  honestly  with  dummy  input,  for  example  0. 

If  Pi  is  corrupted  the  input  step  is  done  honestly  and  then  the  simulator  waits  for  Pi  to  broadcast  S.  Given 
this,  the  simulator  can  compnte  <—  (r  -f  5)  since  it  knows  (all  the  shares  of)  r.  This  is  the  supposed  input 
of  Pi,  which  the  simulator  now  gives  to  the  ideal  functionality  .Tampc- 
Add:  The  simulator  runs  the  protocol  honestly  and  calls  add  on  the  ideal  functionality  Aampc- 
Multiply:  The  simulator  runs  the  protocol  honestly  and  calls  multiply  on  the  ideal  functionality  iAiMPC- 
Output:  The  output  step  is  run  and  the  protocol  is  aborted  if  one  of  the  checks  in  step  4  does  not  go  through. 
Otherwise  the  simulator  calls  output  on  Aampc  and  gets  the  result  y  back.  Now  it  has  to  simulate  shares 
yj  of  honest  parties  such  that  they  are  consistent  with  y.  Note  that  the  simulator  already  has  shares  of 
an  output  value  y'  that  was  computed  using  the  dummy  inputs,  as  well  as  shares  of  the  MAC  for  y' .  The 
simulator  now  selects  an  honest  party,  say  Pk  and  adds  y  —  y'  to  his  share  of  y  and  a{y  —  y')  to  his  share  of 
the  MAC.  Note  that  the  simulator  can  compute  a{y  —  y')  since  it  knows  from  the  beginning  (all  the  shares 
of)  a.  Now  it  simulates  the  openings  of  shares  of  y  towards  the  environment  according  to  the  protocol.  If 
this  terminates  correctly,  send  OK  to  Tampc  (causing  it  to  output  y  to  the  honest  players). 

Fig.  11.  The  simulator  for  JFampc- 


To  see  that  the  simulated  and  real  processes  cannot  be  distinguished,  we  will  show  that  the 
view  of  the  environment  in  the  ideal  process  is  statistically  indistinguishable  from  the  view  in  the 
real  process.  This  view  consists  of  the  corrupt  players’  view  of  the  protocol  execution  as  well  as  the 
inputs  and  outputs  of  honest  players. 

We  first  argue  that  the  view  up  to  the  point  where  the  output  value  is  opened  (step  5  of  the 
‘output’  stage  of  the  protocol)  has  exactly  the  same  distribution  in  the  real  and  in  the  simulated 
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case:  First,  the  value  broadcast  by  honest  players  in  the  input  stage  are  always  uniformly  random. 
Second,  when  a  value  is  partially  opened  in  a  secure  multiplication,  fresh  shares  of  a  random  value 
are  subtracted,  so  the  honest  players  will  always  send  a  set  of  uniformly  random  and  independent 
values.  Third,  the  honest  players  hold  shares  in  MACs  on  the  opened  values,  these  are  random 
sharings  of  a  correct  MAC  with  an  error  added  that  is  determined  by  the  errors  specified  by  the 
environment  in  the  initial  phase.  Therefore,  also  the  MAC  and  shares  revealed  in  step  4  of  ‘output’ 
have  the  same  distribution  in  the  simulated  as  in  the  the  real  process.  Finally  note  that  if  the 
simulated  protocol  aborts,  the  simulator  makes  the  ideal  functionality  fail,  so  the  environment  will 
see  that  honest  players  generate  no  output,  just  as  when  the  real  process  aborts. 

Now,  if  the  real  or  simulated  protocol  proceeds  to  the  last  step,  the  only  new  data  that  the 
environment  sees  is  an  output  value  y,  plus  some  shares  of  honest  players.  These  are  random  shares 
that  are  consistent  with  y  and  its  MAC  in  both  the  simulated  and  real  case.  In  other  words,  the 
environments’  view  of  the  last  step  has  the  same  distribution  in  real  and  simulated  case  as  long  as 
y  is  the  same. 

In  the  simulation,  y  is  of  course  the  correct  evaluation  on  the  inputs  matching  the  shares  that 
were  read  from  the  corrupted  parties  in  the  beginning.  To  finish  the  proof,  it  is  therefore  sufficient 
to  show  that  the  same  happens  in  the  real  process  with  overwhelming  probability.  In  other  words, 
the  event  that  the  real  protocol  terminates  but  the  output  is  not  correct  occurs  with  negligible 
probability. 

Incorrect  outputs  can  result  either  from  corrupted  parties  who  during  the  protocol  successfully 
cheat  with  their  shares  or  from  having  computed  with  triples  where  the  multiplicative  relation  does 
not  hold  (even  if  the  revealed  shares  were  correct).  For  the  latter  case  we  argue  that  with  correct 
shares  the  multiplicative  relation  holds  with  overwhelming  probability,  and  this  follows  from  the 
check  on  the  triples  in  step  1  of  ’Multiply’:  It  is  easy  to  see  that  if  the  triples  are  correct,  the 
check  will  be  true.  On  the  other  hand,  if  some  triple  is  not  correct,  (in  spite  of  correct  shares),  the 
probability  of  satisfying  the  check  is  l/|Fpfc|,  since  there  is  only  one  random  challenge  t,  for  which 
t-{c  —  a-b)  =  {h-g-f).  For  the  former  case  regarding  the  checking  of  shares,  we  have  checks  related 
to  the  openings  of  [[-J-values  (during  ’Input  and  a  single  one  in  ’Output’).  The  rest  of  the  checking  is 
done  in  steps  4  and  5  of  ’Output’.  Being  able  to  cheat  during  an  opening  of  a  [-J-value  corresponds 
to  guessing  at  least  one  private  key  f3i.  Assuming  j3i  is  chosen  randomly  in  ¥pk,  the  probability  is 
at  most  l/|Fpfc|.  Furthermore,  as  we  discussed  in  the  beginning  of  this  section,  the  probability  of  a 
party  being  able  to  cheat  in  step  4  is  (T  +  l)/|Fpfe|  where  T  is  the  number  of  values  opened  during 
secure  multiplications.  In  step  5,  only  one  MAC  is  checked  for  each  output,  so  here  the  probability 
of  cheating  is  l/|Fpfc|  per  check  as  argued  earlier.  Since  the  protocol  aborts  as  soon  as  a  check  fails, 
the  probability  that  it  terminates  with  an  incorrect  output  is  the  maximum  probability  with  which 
any  single  check  can  be  cheated,  which  in  our  case  is  (T  +  l)/|Fpfc|.  This  is  negligible,  since  we 
assume  that  T  is  polynomial  while  is  exponential  in  sec.  □ 

Commitments  based  on  .Tprep  In  the  above  we  assumed  access  to  an  ideal  functionality  for  commit¬ 
ments.  We  can,  however,  do  the  commitments  needed  in  our  protocol  based  only  on  the  output  of 
•^PREP  as  follows.  First  a  random  value  [r]  is  opened  to  the  committer  Pi  (This  could  even  be  done 
in  the  preprocessing).  To  commit  to  a  value  x,  Pi  broadcasts  c  =  r  +  x.  To  open  the  commitment, 
[r]  is  opened  to  all  the  players  who  can  now  compute  c  —  r  =  x.  Correctness  is  still  guaranteed 
because  of  the  MACs  in  [r].  Furthermore,  since  to  begin  with  [r]  is  only  opened  to  Pi,  we  have  that 
c  is  indistinguishable  from  a  random  value  and  can  thus  easily  be  simulated.  To  simulate  during 
’Output’  when  Pi  is  honest  and  has  to  open  his  commitment,  the  simulator  simply  changes  Pi's 
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share  of  [r]  and  the  shares  of  the  MACs  to  make  it  fit  with  the  broadcasted  value  and  the  value  he 
should  have  committed  to.  This  is  possible  because  the  simulator  knows  all  MAC  keys.  It  is  easy 
to  see  that  this  has  communication  and  computational  complexity  O(n^)  per  commitment. 

Implementing  Broadeast  and  Multiple  Inputs/ Outputs  To  implement  broadcast  based  on  point-to- 
point  channels,  we  first  observe  that  since  we  do  not  guarantee  termination  anyway,  the  broadcast 
does  not  have  to  terminate  either.  Therefore  the  following  very  simple  protocol  for  broadcasting 
X  G  Fpfc  is  sufficient: 

1.  The  broadcaster  sends  x  to  all  players. 

2.  Each  player  sends  to  all  players  what  he  received  in  the  previous  step. 

3.  Each  player  checks  that  he  received  the  value  x  from  all  players.  If,  so  output  x,  otherwise  abort. 

This  protocol  has  communication  complexity  0{n^)  field  elements  for  one  broadcast.  However, 
this  can  be  optimized  in  case  we  need  to  broadcast  many  values.  Below,  we  assume  each  player 
sends  one  value,  say  Pi  wants  to  send  Xi.  We  also  assume  that  we  have  a  random  value  [s]  from  the 
preprocessing,  and  that  we  have  an  e-almost  universal  class  of  hash  functions  {/is}  for  negligible  e, 
indexed  by  values  s,  taking  as  input  strings  of  n  elements  in  and  producing  output  in  F^fc.  A 
simple  example  is  where  we  view  the  input  F  as  specifying  coefficients  of  a  polynomial  of  degree 
n— 1,  and  hs{F)  is  the  result  of  evaluating  this  polynomial  in  point  s.  If  two  inputs  F,  F'  are  distinct, 
their  difference  has  at  most  n  —  1  roots,  so  the  probability  that  hs{F)  =  hs{F')  is  (n  —  l)jp^.  The 
protocol  goes  as  follows: 

1.  Pi  sends  Xi  to  all  players. 

2.  [s]  is  opened. 

3.  Each  player  sends  to  all  players  hs{F)  where  F  is  the  string  of  values  he  received  in  the  first 
step. 

4.  Each  players  checks  that  he  received  the  same  hash  value  from  all  players.  If,  so  output  xi, . . .  ,Xn 
as  received  in  the  first  step,  otherwise  abort. 

It  is  clear  that  if  a  player  sent  different  data  to  different  honest  players,  some  honest  player  will 
abort,  except  with  probability  (n  —  l)/p^.  This  protocol  has  complexity  O(n^),  including  also  the 
cost  of  opening  [s].  But  the  cost  per  value  we  broadcast  is  only  0(n).  This  protocol  generalizes 
easily  to  a  case  where  one  player  has  n  values  to  broadcast. 

In  the  online  protocol  we  specified  before,  broadcast  is  used  to  give  inputs  in  the  first  stage. 
Here,  all  players  broadcast  a  value,  and  this  is  readily  implemented  with  the  optimized  broadcast 
protocol  above,  so  we  get  complexity  0(n)  per  input  gate.  If  players  have  several  inputs,  we  just 
execute  several  instances  of  this  broadcast. 

The  only  other  point  where  broadcast  is  used  is  in  partial  openings  where  a  designated  player 
Pi  broadcasts  the  value  that  is  be  to  opened.  Here,  we  can  simply  buffer  the  values  sent  until  we 
have  n  of  them  and  then  do  the  check  in  step  3-4  above  that  Pi  has  sent  the  same  values  to  all 
players.  Note  that  even  if  we  allow  Pi  to  send  different  data  to  different  players  for  a  while,  this 
does  not  allow  information  to  leak:  the  fact  observed  in  the  simulation  proof  above,  that  in  any 
partial  opening  the  honest  players  always  send  random  independent  values,  still  holds  even  if  Pi 
has  sent  inconsistent  data  in  previous  rounds. 
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A. 4  Running  the  online  Phase  with  Small  Fields 

Suppose  we  want  error  probability  and  logp^  is  much  smaller  than  sec. 

When  we  consider  how  to  solve  this  problem,  we  will  at  first  ignore  Step  1  in  the  Multiply 
stage  on  the  online  protocol,  where  one  triple  is  “sacrificed”  to  check  another,  as  this  step  could  be 
done  as  part  of  the  preprocessing.  Nevertheless  we  do  not  want  to  ignore  the  fact  that  this  step  will 
have  a  large  error  probability  1/p^.  We  could  solve  this  by  sacrificing  D  =  \  triples  instead  of 
one,  but  we  can  do  much  better,  and  this  is  described  below  in  Section  “A  smaller  sacrifice”  below. 

Going  back  to  the  actual  online  phase,  we  can  compensate  for  the  fact  that  logp^  is  much 
smaller  than  sec  by  setting  up  the  preprocessing  so  it  can  work  over  an  extension  field  K  of  F^fc  of 
degree  D  =  \  ,  i-e.  an  element  in  K  is  represented  as  [  elements  from  ¥pk .  All  MAC  keys 

and  MAGs  wilf  be  generated  in  K  whereas  all  values  to  be  computed  on  will  still  be  in  ¥pk .  The 
preprocessing  can  ensure  this  because  the  ZK  proof  can  already  force  a  prover  to  choose  plaintexts 
that  decode  to  elements  in  a  subfield  of  K. 

Then  error  probabilities  in  the  proof  of  the  online  phase  that  were  1/p^  before  will  now  be 
1/1  A|  <  The  computational  complexity  of  the  online  phase  will  now  be  0(n|C'|  +  n^)  elemen¬ 

tary  operations  in  K.  Asymptotically,  this  amounts  to  0((n|C'|  -|- n^)Il  log  D  log  log  D)  elementary 
operations  in  F^fe,  where  the  overhead  for  storage  and  communication  is  just  D. 

It  is  also  possible  to  get  error  probability  while  having  the  preprocessing  work  only  over 

Fpfc.  Here  the  overhead  will  be  larger  namely  log  D loglog  D ,  but  this  may  be  the  best  option 
when  D  is  not  very  large.  The  idea  is  to  authenticate  by  doing  D  MAGs  in  parallel  over  F^fe  for 
every  authenticated  value,  using  D  independent  keys. 

We  will  still  do  the  linear  combination  a  =  ejUj  over  K,  where  ej  =  .  This  can  be  done 

by  having  the  preprocessing  generate  D  random  values  and  thinking  of  these  as  an  element  e  €  K. 
Note,  however,  that  we  also  have  to  compute  a  linear  combination  of  the  corresponding  shares  of 
MAGs,  i.e.,  7*  =  ej'y{aj)i,  and  we  have  D  such  MAGs  in  parallel.  This  is  why  we  get  a  overhead 
factor  log  D  log  log  D  for  the  computational  work  in  this  case. 


A  Smaller  Sacrifice.  In  this  section  we  describe  a  different  method  to  check  the  multiplicative 
relation  on  triples  (a),  (6),  (c),  where  a,6, c  G  Fp*.  The  aim  is  to  decrease  the  (amortized)  number 
of  triples  to  sacrifice  per  check.  Our  approach  resembles  a  technique  introduced  by  Ben-Sasson  et 
al  in  [4]  and  one  by  Gramer  et  al  in  [10]. 

The  first  step  in  our  construction  is  to  consider  a  batch  of  t  -|-  1  triples  (a,),  (6*),  (ci)  for  i  = 
1, . . .  -|-  1  at  once.  There  are  two  main  ideas  in  the  construction:  the  first  one  is  to  interpolate 

the  values  and  get  polynomials  A,B,C  G  Fpfc[A]  such  that  A{i)  =  a*,  B{i)  =  bi,  C{i)  =  c*;  if  the 
triples  where  correctly  generated,  one  would  expect  A{x)B{x)  =  C{x)  for  all  x.  The  second  idea 
is  to  think  of  A,B,C  as  polynomials  over  a  field  extension  K  of  F^k,  so  that  one  can  check  the 
expected  multiplicative  relation  evaluating  A,B,C  at  a  random  element  z  G  K;  the  probability 
that  the  check  passes  even  if  some  of  the  triples  did  not  satisfy  the  relation  is  inversely  proportional 
to  the  size  of  K.  We  now  present  the  full  construction. 

—  Let  (oj),  (bi),  (ci),  i  =  l,...,t-|-l,  bea  batch  of  triples  to  check. 

—  One  can  think  of  the  values  oi, . . . ,  at+i  (resp.  61, ... ,  bt+i)  as  t  -|-  1  evaluations  over  ¥pk  of  a 
unique  polynomial  A  G  Fpfe[A]  (resp.  B  G  Fpfe[A])  of  degree  t.  Goncretely,  one  can  define  the 
polynomial  A  (resp.  B)  such  that  A{i)  =  ai  (resp.  B{i)  =  bi).  Since  the  coefficients  of  A  (resp. 
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B)  can  be  computed  as  a  linear  combination  of  the  a^’s  (resp.  6i’s),  the  players  can  compute 
representations  of  such  coefficients  by  local  computation. 

—  Players  can  compute  {at+2),  ■  ■  ■ ,  (fl2i+i)  such  that  A{i)  =  Oj,  again  by  local  computation,  since 
evaluating  a  polynomial  is  a  linear  operation. 

—  Players  can  engage  in  the  multiplication  step  of  the  online  phase  with  input  (a^),  (6*),  and  get 
(cj)  (hopefully  Cj  =  0*6*)  for  i  =  t  +  2, . . . ,  2t  +  1.  Notice  that  players  call  the  multiplication  step 
t  times  here,  so  they  sacrifice  t  triples. 

—  Using  only  linear  computation  players  can  now  compute  representations  of  coefficients  of  the 
unique  polynomial  C  G  F^fe  [X]  of  degree  2t  such  that  C{i)  =  Cj  for  i  =  l,...,2t  +  l. 

—  Let  iL  be  a  field  extension  of  Fp*  of  degree  D.  It  is  possible  to  think  of  A,  B,  C  as  polynomials 
over  K,  by  embedding  the  coefficients  via  the  natural  map  F^k  — >  K.  Players  now  evaluate 
representations  for  A(z)B(z),  and  C{z),  where  2;  is  a  public  random  element  in  K,  and  check  if 
A{z)B(z)  =  C{z)  by  outputting  A{z)B{z)  —  C{z)  and  checking  if  the  result  is  zero.  This  check 
can  be  repeated  a  number  of  times  in  order  to  lower  the  error  probability.  If  the  check  passed 
all  the  times,  players  consider  the  original  triples  as  valid;  otherwise,  they  discard  the  triples 
and  start  again  with  fresh  triples. 

Notice  that  in  order  to  compute  A{z)B{z)  and  C(z),  players  need  to  compute  at  most 
multiplications  over  F^k,  since  A(z)B(z)  can  be  computed  by  multiplying  aDxD  matrix  (dependent 
of  A(z))  with  the  vector  B{z)  (over  K,  multiplication  by  a  fixed  element  is  an  endomorphism  of  K 
as  a  Fpfc-vector  space).  Notice  also  that  we  may  use  the  old  method  of  sacrificing  more  than  one 
triple  per  multiplication  to  get  any  desired  error  probability  for  the  multiplications  over  F^k.  We 
analyze  below  the  error  probability  we  must  require. 

For  the  analysis  of  the  construction,  one  sees  that  if  the  multiplicative  relation  was  satisfied  by 
all  the  original  triples,  the  polynomials  AB  and  C  are  equal,  so  the  final  test  passes.  In  case  the 
triples  did  not  satisfy  the  relation,  then  the  polynomials  AB  and  C  are  different,  but  since  they  are 
both  of  degree  at  most  2t,  they  can  agree  in  at  most  2t  points.  Therefore,  if  2;  is  a  root  of  AB  —  C, 
then  the  test  passes,  and  uniform  elements  in  K  are  roots  of  AB  —  C  with  probability  at  most 
2t/\K\.  If  2;  is  not  a  root  of  AB  —  C,  the  test  passes  only  if  the  multiplication  A{z)B{z)  does  give 
the  correct  result,  so  if  we  make  sure  this  happens  with  probability  at  most  2t/\K\  (by  sacrificing 
enough  triples  in  the  process),  then  the  error  probability  of  the  construction  is  bounded  by  2t/\K\ 
for  a  single  run  of  the  test.  In  order  to  get  negligible  error  probability  we  reapeat  this  phase  enough 
times. 

An  important  fact  to  notice  is  that  in  this  construction  we  need  2t  +  1  <  F^k ,  since  otherwise 
there  are  not  enough  elements  to  evaluate  the  polynomials.  In  order  to  circumvent  this  restriction, 
one  can  still  apply  the  above  construction  but  replacing  F^k  with  an  extension  F^j,/  with  the  required 
property. 

Asymptotically,  we  see  that  as  we  increase  the  number  t  +  1  of  triples  checked,  we  always  need 
to  sacrifice  t  triples,  and  in  addition  the  number  we  need  to  check  the  multiplication(s)  in  K.  If 
we  assume  that  we  want  to  hit  the  desired  error  probability  with  just  one  iteration  of  the  test,  we 
have  =  2t/|A'|  from  which  we  get  log  |Ar|  =  sec  +  log2t.  The  degree  of  the  extension  to  K  is 
log  \K\/  logp^ ,  and  the  number  of  basic  secure  multiplications  we  need  is  at  most  the  square  of  this 
number,  which  is  (sec  +  log  2t)^/(logp^)^.  For  each  of  these,  we  need  error  essentially  2“^®'^,  so  the 
number  of  triples  we  need,  say  m,  satisfies  2“^®'^  =  (l/p^)™,  so  we  get  m  =  sec/ log This  in  total 
grows  only  poly-logarithmically  with  t,  so  we  conclude  that  for  a  given  desired  error  probability, 
the  number  of  triples  we  need  to  sacrifice  to  check  t  +  1  triples  is  0{t  +  polylog{t)). 
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Comparing  the  two  Approaches:  A  Concrete  Example.  We  here  compare  the  above  ap¬ 
proaches  for  checking  triples.  Suppose  p  =  2  and  k  =  8,  so  =  F28.  Suppose  there  are  also 
t  +  1  =  128  triples  to  check  with  security  level  of  2“®^. 

Using  the  latter  approach,  with  K  =  F216,  we  need  to  sacrifice  t  =  127  triples  to  generate 
(ct+2), . . . ,  (c2t+i);  moreover  we  need  to  perform  4  secure  multiplications  to  check  if  A{z)B{z)  = 
C{z),  since  A  is  a  vector  space  of  dimension  2  over  F28.  In  order  for  the  multiplications  to  be 
secure  enough,  we  need  them  to  be  correct  up  to  error  probability  (2  •  127)/2^®  ~  2“®  for  the  entire 
multiplication  A(z)B(z).  This  will  be  the  case  if  for  each  of  the  4  small  multiplications  we  use  3 
triples  for  the  multiplication,  namely  one  to  do  the  actual  multiplication  an  two  to  check  the  first 
one.  This  gives  a  total  error  of  at  most  4  •  2“^®  <  2“®.  So  since  one  run  of  the  test  leads  to  an  error 
probability  of  ~  2“®,  we  need  10  runs  to  decrease  the  error  probability  to  2“®^.  Therefore,  the  total 
number  of  triples  to  sacrifice  is  128-|-4-3-10  =  248,  while  with  the  original  approach  the  number 
of  triples  to  sacrifice  would  have  been  128  •  10  =  1280. 

A. 5  Preprocessing  Phase 

Proof  (Theorem  3). 

Recall  first  that  we  assume  the  cryptosystem  has  an  alternative  key  generation^algorithm 
KeyGen*()  which  is  a  randomized  algorithm  that  outputs  a  meaningless  public  key  pk  with  the 
property  that  an  encryption  of  any  message  Enc^(x)  is  statistically  indistinguishable  from  an  en¬ 
cryption  of  0.  Furthermore,  if  we  set  (pk,  sk)  ^  KeyGen()  and  pk  <—  KeyGen*(),  then  pk  and  pk  are 
computationally  indistinguishable. 

We  construct  a  simulator  5prep  for  IIprep-  In  a  nutshell,  the  simulator  will  run  a  copy  of  the 
protocol.  Here,  it  will  play  the  honest  players’  part  while  the  environment  Z  plays  for  the  corrupt 
players.  The  simulator  also  internally  runs  copies  of  .TkeyGen  and  .Trand;  in  order  to  simulate  calls 
to  these  functionalities.  Note  that  in  the  following  we  say  that  the  simulator  executes  or  performs 
some  part  of  the  protocol  as  shorthand  for  the  simulator  going  through  that  part  with  Z.  During 
the  protocol  execution,  whenever  Z  sends  ciphertexts  on  behalf  of  corrupt  players,  the  simulator 
can  obtain  the  plaintexts,  since  it  knows  the  secret  key.  These  values  are  then  used  to  generate 
input  to  .TpREP-  A  precise  description  is  provided  in  Figure  12. 

We  now  need  to  show  that  no  Z  can  distinguish  between  the  simulated  and  the  real  process.  By 
contradiction,  we  assume  that  there  exists  Z  that  can  distinguish  these  two  cases  with  significant 
advantage  e.  The  output  of  .Z  is  a  single  bit,  thought  of  a  as  guess  at  one  of  the  two  cases.  Concretely, 
we  assume 

A{Z)  :=  Pr  [“Real”  ^  Z(Real  process)]  —  Pr  [“Real”  ^  .^(Simulated  process)] 

>  e. 

We  will  show  that  such  Z  can  be  used  to  distinguish  between  a  normally  generated  public  key  and 
a  meaningless  one  with  basically  the  same  advantage.  This  leads  to  a  contradiction,  since  a  key 
generated  by  the  normal  key  generator  is  computationally  indistinguishable  from  a  meaningless 
one. 

More  in  detail,  we  construct  an  algorithm  B  that  takes  as  input  a  public  key  pk*  (randomly 
chosen  as  either  a  normal  public  key  or  a  meaningless  one),  sets  up  a  copy  of  Z,  goes  through  the 
protocol  with  Z  and  uses  its  output  to  guess  the  type  of  key  it  got  as  input.  During  the  process 
B  uniformly  chooses  a  bit  (that  can  be  thought  as  a  switch  between  “Real”  and  “Simulation”):  in 
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Simulator  5prep 

SReshare(eni):  This  is  a  subroutine  the  simulator  will  use  while  executing  the  main  steps  of  the  protocol  described 
below.  Any  time  in  IIprep,  when  there  is  a  call  to  Reshare(em),  the  simulator  proceeds  as  the  protocol,  but 
it  performs  the  following  extra  tasks  in  order  to  retrieve  the  quantity  Am- 

—  On  step  2  the  simulator  decrypts  EnCpk(fi), . . . ,  EnCpk(fn)  and  obtains  the  values  fi, . . . ,  f„ 

—  On  step  5  the  simulator  performs  step  2  of  J^^keyGenDec  ,  and  thereby  obtains  m  +  f  decrypting  Cm+f,  and 
(m  +  f)'  from  the  adversary 

—  The  simulator  sets  Am  ^  (m  +  f)'  —  (m  +  f),  that  is  Am  is  the  difference  between  the  output  chosen 
by  the  adversary  for  the  decryption  of  Cm+f  and  the  decryption  itself. 

—  The  simulator  computes  and  stores  mi  ^  (m  +  f)'  —  fi,  and  mi  « - fi  for  i  ^  1. 

Initialize: 

—  The  simulator  performs  the  initialization  steps  of  IIprep-  The  call  to  .^rteyGenDec  in  step  1  is  simulated 
by  running  KeyGen  to  generate  the  key  pair  (pk,sk).  The  simulator  then  sends  pk  to  the  players  and 
stores  sk. 

—  Steps  2-5  are  performed  according  to  the  protocol,  but  the  simulator  decrypts  every  broadcast  ciphertext 
and  obtains  ai , . . . ,  q;„  ,  /3i , . . . ,  /3„ 

—  Step  6  is  performed  according  to  the  protocol,  but  the  simulator  gets  Ai  ^  SReshare(e.y(a.i3j)), . . . ,  A„  ^ 
SReshare(ea./3„) 

—  The  simulator  calls  Initialize  on  JFprep  with  input  {ai}i^A  at  step  1,  {f3i}i£A  at  step  3  and  Ai, . . . ,  A„ 
at  step  5 

Pair: 

—  The  simulator  performs  step  1  according  to  the  protocol 

—  Steps  2-3  are  performed  according  to  the  protocol,  but  the  simulator  decrypts  every  broadcast  ciphertext 
and  obtains  ri, . . . ,  r„ 

—  Step  4  is  performed  according  to  the  protocol,  but  the  simulator  gets  A  ^  SReshare(er.a),  Ai  ^ 
SReshare(er./3i), . . . ,  A„  ^  SReshare(er./3„) 

—  The  simulator  calls  Pair  on  JFprep  with  input  {rijigA  at  step  1,  and  A,  Ai, . . .  ,  A„  at  step  3 

Triple: 

—  The  simulator  performs  step  1  according  to  the  protocol 

—  Steps  2-3  are  performed  according  to  the  protocol,  but  the  simulator  decrypts  every  broadcast  ciphertext 
and  obtains  ai , . . . ,  a„ ,  bi , . . .  ,  b„ 

—  Steps  4-5  are  performed  according  to  the  protocol,  but  the  simulator  gets  Aa  ^  SReshare(ea.a),  Ab  <— 
SReshare(eb.a) 

—  Steps  6-7  are  performed  according  to  the  protocol,  but  the  simulator  gets  ci, . . .  c„  and  5  ^  SReshare(ec) 

—  Step  8  is  performed  according  to  the  protocol,  but  the  simulator  gets  Ac  ^  SReshare(ec.a) 

—  The  simulator  calls  Triple  on  JFprep  with  input  {aijigA,  {b^jigA  at  step  1,  Aa,  Ab,  5  at  step  3,  {cijigA 
in  step  5,  and  Ac  at  step  7 

Fig.  12.  The  simulator  for  JFprep. 


case  pk*  is  correctly  computed,  if  the  bit  is  set  to  “Real”,  Z'’s  view  is  indistinguishable  from  a  real 
execution  of  the  protocol,  while  if  the  bit  is  set  to  “Simulation” ,  Z'’s  view  is  indistinguishable  from 
a  simulated  run.  However,  in  case  pk*  is  meaningless,  both  choices  of  the  bit  lead  to  statistically 
indistinguishable  views.  Hence,  if  Z  guesses  correctly  whether  B  chose  “Real”  or  “Simulation”,  B 
guesses  that  pk*  was  a  standard  public  key;  otherwise  B  guesses  that  pk*  was  meaningless. 

For  simplicity  we  describe  the  algorithm  B  for  the  two-party  setting,  where  there  is  a  corrupt 
party  Pi  and  an  honest  party  P2:  On  input  pk*,  where  pk*  is  a  public  key  (either  meaningless  or 
standard),  B  starts  executing  the  protocol  Hprep,  playing  for  P2,  while  Z  plays  for  Pi.  B  does 
exactly  what  the  simulator  would  do,  with  some  exceptions: 

1.  It  uses  the  public  key  it  got  as  input,  instead  of  generating  a  key  pair  initially. 

2.  B  cannot  decrypt  ciphertexts  from  Pi  since  it  does  not  know  the  secret  key  (e.g.  at  step  4 
of  Initialize,  step  2  of  Pair,  step  2  of  Triple,  etc.).  Instead,  B  exploits  that  Pi  and  P2  ran  the 
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protocol  HzkpoPK  with  Pi  as  prover.  That  is,  Pi  proved  that  he  knows  encodings  of  appropriate 
size  corresponding  to  the  plaintext  inside  the  ciphertexts  broadcast  in  the  previous  step.  This 
means  B  can  use  the  knowledge  extractor  of  the  protocol  HzkpoPK  followed  by  decoding  to 
extract  the  shares  from  Pi  (e.g.  at  step  4  of  Initialize,  etc).  At  this  point  B  continues  the 

protocol  as  if  it  had  decrypted.  Note  that  the  knowledge  extractor  requires  rewinding  of  the 
prover  (which  here  effectively  \s  Z).  B  can  do  this  as  it  runs  its  own  copy  of  Z  and  since  it  also 
controls  the  copy  of  Prand  used  in  the  protocol,  it  can  issue  challenges  of  its  choice  to  Z. 

3.  When  P2  gives  a  ZK  proof  for  a  set  of  ciphertexts,  B  will  simulate  the  proof.  This  is  done  by 
running  the  honest  verifier  simulator  to  get  a  transcript  (a,  e,  (z,T))  and  letting  the  copy  of 
Prand  output  e  that  occurs  in  the  simulate  transcript. 


In  the  end  B  uniformly  chooses  to  generate  a  real  or  a  simulated  view.  In  the  first  case,  B 
outputs  to  Z  exactly  those  values  for  P2  that  were  used  in  the  execution  of  the  protocol.  In  the 
other  case,  B  generates  the  output  for  P2  as  Prrep  would  do.  That  means  that  P2S  shares  a2,  b2,  C2 
of  a  triple  (a),  (b),  (c)  will  be  determined  by  choosing  a,  b  at  random,  setting  c  <—  a  •  b  and  then 
letting  a2  <—  a  —  b2  <—  b  —  C2  <—  c  — 

It  can  now  be  seen  that  if  pk*  is  a  normal  key,  then  the  view  generated  by  B  corresponds 
statistically  to  either  a  real  or  a  simulated  execution:  if  B  chooses  the  simulation  case,  the  only 
differences  to  the  actual  simulator  are  1)  the  simulator  executes  the  ZK  proofs  given  by  P2  according 
to  the  protocol  while  B  simulates  them;  and  2)  the  simulator  opens  the  ciphertexts  using  the  secret 
key  to  decrypt,  while  B  uses  the  extractor  for  IIzkpoPK  and  computes  the  plaintexts  from  its  results. 
As  for  1)  the  ZK  proof  is  statistical  ZK  so  this  leads  to  a  statistically  indistinguishable  distribution. 
As  for  2),  note  that  for  every  ciphertext  Cx  generated  by  Pi,  the  extractor  for  IIzkpoPK  will,  except 
with  negligible  probability,  be  able  to  find  an  encoding  x  (resp.  randomness  r)  smaller  than  Bpiain 
(resp.  Brand)  1  with  Cx  =  EnCpk(x,  r).  This  follows  from  soundness  of  HzkpoPK  and  admissibility  of 
the  cryptosystem.  Then,  by  correctness  of  the  cryptosystem,  computing  the  plaintexts  as  B  does, 
will  indeed  give  the  same  result  as  decrypting,  except  with  negligible  probability.  If  B  chooses  the 
real  case,  a  similar  argument  shows  that  we  get  a  view  statistically  indistinguishable  from  a  real 
run  of  the  protocol.  Hence  if  pk*  is  a  normal  key,  Z  can  guess  P’s  choice  of  “Real”  or  “Simulation” 
with  advantage  essentially  e. 

On  the  other  hand  if  pk*  is  a  meaningless  key,  the  encryptions  contain  statistically  no  information 
about  the  values  inside.  Moreover,  all  messages  sent  in  the  zero-knowledge  protocols  where  P2  acts 
as  prover,  do  not  depend  on  the  specific  values  that  P2  has,  since  the  proofs  are  simulated.  We 
conclude  that  essentially  no  information  on  any  value  held  by  P2  is  revealed.  This  is  the  case  also 
for  step  5  of  Reshare(eni):  m-|-f  is  retrieved,  but  no  information  on  m  is  revealed,  since  f  is  uniform. 

The  view  Z  sees  consists  of  the  view  of  the  corrupt  player(s)  and  the  output  of  the  honest 
player(s).  We  just  argued  that  the  view  of  the  corrupt  player  is  essentially  independent  of  the 
internal  values  B  uses  for  P2,  and  hence  also  independent  of  whether  P  chooses  the  real  or  the 
simulated  case.  Therefore,  the  output  generated  for  the  honest  player(s)  seen  by  Z  is  in  both  cases  a 
set  of  (essentially)  uniformly  and  independently  chosen  shares  and  MAC  keys.  As  a  result,  if  we  use 
a  meaningless  key,  a  real  execution  and  a  simulated  execution  are  statistically  indistinguishable,  and 
the  guess  of  Z  will  equal  P’s  random  choice  of  “Real”  or  “Simulation”  with  probability  essentially 
1/2. 
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An  easy  calculation  now  shows  that  the  advantage  of  B  is 


A(i?)  :=  Pr  [“Standard  Key”  <—  -B(pk)]  —  Pr 


“Standard  Key” 


>  k{Z)l2-5 

=  e/2  -  <5, 


B{pk) 


for  some  negligible  6  that  accounts  for  the  differences  between  the  involved  distributions.  However, 
if  e  is  non-negligible,  then  e/2  —  5  is  also  non-negligible,  which  contradicts  the  assumption  on  that 
meaningless  keys  are  statistically  indistinguishable  from  standard  ones.  □ 


A. 6  Distributed  Decryption 

Proof  (Theorem  4).  The  requirement  B  +  2^®*^  ■  B  <  q/2  implies  that  t'  =  t  mod  p,  since  ||rj||oo  < 
2=®'^  ■  B/{n-p)  for  i  =  ,n.  Therefore  the  protocol  allows  players  to  retrieve  the  correct  message 

if  all  the  players  are  honest. 

We  now  build  a  simulator  5ddec  to  work  on  top  of  .TkeyGenDec;  such  that  the  adversary  cannot 
distinguish  whether  it  is  playing  with  the  decryption  protocol  and  .TkeyGen  or  the  simulator  and 
•TkeyGenDec-  We  let  A  denote  the  set  of  players  controlled  by  the  adversary. 


Simulator  5ddec 

Key  Generation:  This  stage  is  needed  to  distribute  shares  of  a  secret  key. 

—  Upon  “start”,  the  simulator  sends  “start”  to  TkeyGenDec  and  obtains  pk.  Moreover,  the  simulator  obtains 
(ski)ieA  from  the  adversary. 

—  The  simulator  (internally)  sets  random  (ski)i^A  such  that  (ski)i=i,...,n  is  a  full  vector  of  shares  of  0. 

—  The  simulator  sends  pk  to  A. 

Public  Decryption:  This  stage  simulates  a  public  decryption. 

—  Upon  “decrypt  c,  B”,  the  simulator  sends  “decrypt  c”  to  .TkeyGenDec  and  obtains  m  =  DeCsk(c). 

—  It  then  computes  the  value  Vi  for  all  players  except  for  an  honest  player  Pj. 

—  It  then  samples  r^  uniformly  with  infinity  norm  bounded  by  •  B /(n  ■  p)  and  computes 

tj  « - Vi  +  p  ■  Vj  +  encode(m). 

—  For  each  other  honest  player  Pi,  it  computes  t;  honestly  (using  c,  ski). 

—  The  simulator  broadcasts  the  values  {ti)i^A,i^jAj  and  obtains  (t|)igA  from  the  adversary. 

—  It  then  sends  m'  <—  decode  t*  +  mod  p'j  to  JAceyGenDec  so  that  the  ideal  func¬ 

tionality  sends  “Result  m'”  to  all  the  players. 

Private  Decryption:  This  stage  simulates  a  private  decryption. 

—  Upon  “decrypt  c,  B  to  Pf’,  the  simulator  sends  “decrypt  c  to  Pf’  to  JAceyGenDec- 

—  If  Pj  is  corrupt,  the  simulator  obtains  c,m  =  DeCsk(c)  from  JAceyGenDec  and  acts  as  in  the  simulated 
public  decryption. 

—  If  Pj  is  honest,  the  simulator  receives  c  from  TkeyGenDec,  t*  from  each  corrupt  player  Pi  and  ti  from 
each  honest  player. 

•  The  simulator  samples  Vj  uniformly  with  infinity  norm  bounded  by  2“'^  •  B/{n  ■  p). 

•  It  evaluates  tj  < - ffi^j'^i  +  P  ■  rj. 

•  It  computes  e  ^  +  EigA  ***  +  Y^i^A.i^j  “od  P 

•  Finally  it  sends  S  ^  decode(e)  to  TkeyGenDec  in  order  to  get  DeCsk(c)  -I-  <5  to  Pj. 

Fig.  13.  The  simulator  for  IIddeo. 


252 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


In  a  simulated  decryption  the  adversary  receives  pk  and  tj  from  5ddec-  The  distri¬ 

bution  of  pk  is  the  same  as  in  a  real  conversation,  since  it  was  sampled  using  the  same  algorithm 
as  in  a  real  conversation.  The  distribution  of  simulated  tj,  i  ^  j  is  statistically  close  to  the  real 
one,  since  tj  was  computed  correctly  using  shares  of  a  possible  secret  key.  We  can  therefore  focus 
on  the  case  where  all  the  players  but  one  are  dishonest.  We  first  analyse  the  simulation  of  public 
decryption,  introducing  a  hybrid  machine,  and  prove  its  output  is  statistically  indistinguishable 
from  Pj’s  output  (in  the  real  protocol)  and  perfectly  indistinguishable  from  Pj's  simulated  output. 

Hybrid:  On  input  (skj)j=i^..,^„, c,  reconstruct  sk,  compute  DeCsk(c),  sample  rj  uniformly  with  in¬ 
finity  norm  bounded  by  2^®'^  ■  B/{n  •  p)  and  output  tj  < - Vj  -|-  p  •  r^  -|-  encode(m) . 

Notice  that  tj  =  Vj  —  t -|-encode(m)  +p-rj.  Now,  for  a  distribution  X,  define  ^p^X)  :=  p- X 
Notice  that  tj  =  (p{U),  where  U  denotes  the  uniform  distribution  over  vectors  of  integral  entries 
bounded  with  infinity  norm  2^^'^  ■  B]  moreover,  since  t  —  encode(m)  is  a  multiple  of  p,  one  can  write 
tj  =  ip{U  +  (encode(m)  —  t)/p).  Since  ||(encode(m)  —  t)/p||oo  <  {B  +  l)/p  and  U  is  uniform  in  an 
exponentially  larger  range,  then  the  distribution  U  +  (encode(m)  —  t)/p  is  statistically  close  to  U. 
Therefore  tj  is  statistically  close  to  tj. 

What  is  left  to  prove  is  that  the  simulation  of  private  decryption  to  an  honest  player  Pj  is 
statistically  indistinguishable  from  the  real  protocol.  In  the  real  protocol  Pj  computes  tj  and 

m'  <—  decode  (  T 

\i&A  i^A 

In  that  case  the  error  m!  —  m  introduced  by  the  adversary  depends  only  on  the  value 


computed  using  the  actual  secret  key.  In  the  simulation  the  error  introduced  by  the  adversary  is 


e  = 


*i  +  X]***+  X]  tj  j  mod  p  =  (  ^(t  ■  -  tj)  j  mod  p, 


E 

i^A 


computed  using  secret  shares  of  0.  Since  the  secret  sharing  scheme  has  privacy  threshold  n  and  the 
sums  involve  at  most  n  —  1  shares,  the  quantities  e  and  e'  are  statistically  indistinguishable.  □ 


B  A  lower  Bound  for  the  Preprocessing 

In  this  section,  we  show  that  any  preprocessing  matching  the  properties  we  have,  must  output  the 
same  amount  of  data  as  we  do,  up  to  a  constant  factor.  We  use  the  following  theorem  for  2-party 
computation  from  [29] .  It  talks  about  a  setting  where  the  parties  A,  B  have  access  to  a  functionality 
that  gives  a  random  variable  U  to  A  and  V  to  B  with  some  guaranteed  joint  distribution  Pjjv  of 
U,  V.  Given  this,  the  parties  compute  securely  a  function  f  :  X  x  y  Z,  where  A  holds  x  £  X, 
and  B  holds  y  G  T-  This  function  should  have  the  property  that  there  exists  inputs  yi,y2  such  that 
for  all  X  /  x',  f{x,yi)  /  f{x',yi)]  and  for  all  x,x\  f{x,y2)  =  f{x\y2)-  In  other  words,  for  some 
inputs  B  learns  all  of  ^’s  input,  but  other  inputs  B  learns  nothing  new. 
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Theorem  6.  Let  f  :  X  xy  Z  be  a  funetion  with  inputs  yi,y2  as  above.  If  there  exists  a  protoeol 
that  eomputes  f  seeurely  with  aeeess  to  Puv  and  with  error  probability  e  in  the  semi-honest  model, 
then 

H{V)  >  I{U;  V)  >  log  |T|  -  7(€log  |T|  +  h{e)) 

We  will  also  need  the  following  technical  lemma 

Lemma  1.  Let  R  be  a  random  variable  defined  over  the  natural  numbers.  Then  there  exists  a 
eonstant  C  sueh  that  E{R)  >  H{R)  —  1  —  C . 

Proof  (Lemma  1).  Let 

Under  such  a  definition,  one  can  write  H{R)  as 

H(K)  =  g  Pr\R  =  i|  .  log  +  E  Pr\R  =  i|  '  log  ( 

By  the  construction  of  /,  one  can  bound  the  first  summand  as  follows 
^  Pr[i?  =  i]- log  <Y,Pr[R  =  i]-i 

i&I  ^  ^  ^  '  i&I 

<  ^  Pr[R  =  i]  ■  i 

i 

=  E{R). 

For  the  second  summand  one  needs  to  work  a  bit  more.  Let  q{i)  :=  log(l /Pr[i?  =  i]).  Then 

^Pr[i?  =  i]  -logT 

\  L  J  / 

We  now  claim  that 

2“'JW  .  gr(i)  <  2*  •  i,  for  all  0  /  i  ^ 

This  happens  if  and  only  if 

2-'?(*)  .  2*°s(9(*))  <  2“*  . 

Taking  the  logarithm  of  such  relation  one  gets  —q{i)  +  log((7(i))  <  — i  +  log(z),  which  is  equivalent 
to  q{i)  —  log(g(i))  >  i  —  log(i). 

Since  q{i)  =  log(l/Pr[i?  =  i])  >  i  for  all  i  ^  I,  and  i  >  1,  the  latter  relation  is  always  satisfied. 
Therefore,  one  can  bound  the  second  summand  by  C  +  X]j>i  '  b  where  C  =  2“'^^®)  •  ^(0). 
Moreover  2“*  •  i  converges  to  1,  so  the  second  summand  can  be  bound  by  1  +  C. 

Finally,  one  can  reassemble  all  the  reasoning  into  one  and  get 

E PrlR  =  i]  ■  log  +  E P’^IP  =  •]  ■  ‘“8  ^  ^ ^ 

The  last  inequality  implies  that  H{R)  <  E{R)  +  1  +  C  □ 
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With  this  result,  we  can  prove  the  lower  bound  claimed  earlier: 

Proof  (Theorem  2).  Suppose  we  have  an  on-line  protocol  tt  that  satisfies  the  assumptions  in  the 
theorem.  Consider  any  player  Pi  and  suppose  we  want  to  compute  the  function 

fT{ix,x'),y)  =yx  +  {l-y)x'. 

Here  y  G  F^fe  and  £c,  x'  are  vectors  over  F^fc  of  length  T.  Pi  will  have  input  y  and  each  Pj,  j  /  f  will 
have  as  input  substrings  Xj,x'j  such  that  the  concatenation  of  all  xj  (x'j)  is  x  {x').  Finally,  only 
Pi  learns  the  output  fT{{x,x'),y). 

Clearly,  can  be  computed  using  a  circuit  of  size  0{T),  and  this  will  be  the  circuit  promised 
in  the  theorem.  Note  that  our  assumed  protocol  tt  can  handle  circuits  of  size  S  and  can  therefore 
compute  /t  securely  where  T  is  0{S). 

We  can  now  transform  tt  to  a  two-party  protocol  tt'  for  parties  A  and  B.  A  has  input  x,x',  B 
has  input  y  and  B  is  supposed  to  learn  /^((a;,  x'),y).  Now,  tt'  simply  consists  of  running  tt  where  B 
emulates  Pi  and  A  emulates  all  other  players.  We  give  to  B  whatever  Pi  gets  from  the  preprocessing 
and  A  gets  whatever  the  other  players  receive,  so  this  defines  the  random  variables  U  and  V.  Since 
TT  is  secure  if  Pi  is  corrupt  and  also  if  all  other  players  are  corrupt,  this  trivially  means  that  n'  is 
an  actively  secure  two-party  protocol  for  computing  /y. 

This  implies  that  tt'  also  computes  /t  with  passive  security.  As  noted  in  [29],  this  is  actually  not 
necessarily  the  case  for  all  functions.  The  problem  is  that  if  the  adversary  is  passive,  then  active 
security  does  guarantees  that  there  is  a  simulator  for  this  case,  but  such  a  simulator  is  allowed  to 
change  the  inputs  of  corrupted  parties.  A  simulator  for  the  passive  case  is  not  allowed  to  do  this. 
However,  [29]  observe  that  for  some  functions,  an  active  simulator  cannot  get  away  with  changing 
the  inputs,  as  this  would  make  it  impossible  to  simulate  correctly.  They  show  this  is  the  case  for 
Oblivious  Transfer  which  is  essentially  what  ff  is  after  we  go  to  the  2-party  case.  We  may  therefore 
assume  tt'  is  also  passively  secure. 

Finally,  we  define  f!p{x,y)  =  fT{{x,0),y)  =  yx.  Obviously  tt'  can  be  used  to  compute  ff 
securely,  A  just  sets  her  second  input  to  be  0.  Moreover  ff  satisfies  the  conditions  in  Theorem 
6.  So  we  get  that  H{V)  >  log  jA]  —  7(elog  jA]  -|-  h{e)).  If  we  adopt  the  standard  convention  that 
the  security  parameter  grows  linearly  with  the  input  size  log  j  A]  then  because  e  is  negligible  in  the 
security  parameter,  we  have  that  the  “error  term”  7(elog  JA]  -|-  h{e))  is  o(log  jAj). 

So  we  get  that  H{V)  is  l?(log  JAj)  =  l7(Tlogp^)  =  f2{Slogp^),  since  T  is  0{S).  Recalling  that 
H(y)  is  actually  the  entropy  of  the  variable  Pi  received  in  the  original  protocol  tt,  we  get  the  first 
conclusion  of  the  Theorem. 

For  the  second  conclusion  about  the  computational  work  done,  it  is  tempting  to  simply  claim 
that  B  has  to  at  least  read  the  information  he  is  given  and  so  H{V)  is  a  lower  bound  on  the 
expected  number  of  bit  operations.  But  this  is  not  enough.  It  is  conceivable  that  in  every  particular 
execution,  B  might  only  have  to  read  a  small  part  of  the  information. 

It  turns  out  that  this  does  not  happen,  however,  which  can  be  argued  as  follows:  let  B{V)  be 
the  random  variable  representing  the  bits  of  V  that  B  actually  reads.  By  inspection  of  the  proof  of 
Theorem  6,  one  sees  that  if  we  replace  everywhere  V  by  B{V)  the  same  proof  still  applies.  So  in  fact, 
we  have  H{B(y))  >  log  jAj  —  7(elog  jAj  -|-  h{e)).  Now  let  R  be  the  random  variable  representing 
the  number  of  bits  B  reads  from  V. 

If  we  condition  on  R,  then  the  entropy  of  BfV)  cannot  drop  by  more  than  H{R),  so  we  have 
H{B{V)\R)  >  H{B{V))  -  H{R)  >  log  |A|  -  7(elog  |A|  +  h{e))  -  H{R). 
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Moreover,  we  also  have 

H{B{V)\R)  =  '^Pr{R  =  r)H{B{V)\R  =  r)  <  '^Pr{R  =  r)r  =  E{R) 

r  r 

Putting  these  two  inequalities  together,  we  obtain  that 

E{R)  +  H{R)  >  log  |A’|  -  7(elog  |;i^|  +  /i(e)). 

Now,  either  E{R)  >  (log \X\  —  7(elog  \X\  +  /i(e)))/2,  or  H{R)  >  (log  \X\  —  7(elog  \X\  +  h{e)))/2.  In 
the  latter  case  we  have  from  Lemma  1  that  E{R)  is  much  larger  than  H{R),  so  we  can  certainly 
conclude  that  E{R)  >  (log  |d:’|— 7(elog  \  X\+h{e))) /2  in  any  case.  As  above,  the  error  term  depending 
on  e  becomes  negligible  for  increasing  security  parameter,  so  we  get  that  E{R)  is  f2{S\ogp^)  as 
desired.  □ 

C  Canonical  Embeddings  of  Cyclotomic  Fields 

Our  concrete  instantiation  will  use  some  basic  results  of  Cyclotomic  fields  which  we  now  recap  on; 
these  results  are  needed  for  the  main  result  of  this  Appendix  which  is  a  proof  of  a  “folklore”  result 
about  the  relationship  between  norms  in  the  canonical  and  polynomial  embeddings  of  a  cyclotomic 
field.  This  result  is  used  repeatedly  in  our  main  construction  to  produce  estimates  on  the  size  of 
parameters  needed. 

C.l  Cyclotomic  Fields 

We  first  recap  on  some  basic  facts  about  numbers  fields,  and  their  canonical  embeddings.  Focusing 
particularly  on  the  case  of  cyclotomic  fields. 

Number  Fields  An  algebraic  number  (resp.  algebraic  integer)  0  G  C  is  the  root  of  a  polynomial 
(resp.  monic  polynomial)  with  coefficients  in  Q  (resp.  Z).  The  minimal  polynomial  of  6  is  the  unique 
monic  irreducible  f{x)  G  Q[A]  which  has  0  as  a  root. 

A  number  field  K  =  Q{9)  is  the  field  obtained  by  adjoining  powers  of  an  algebraic  number  6  to 
Q.  If  9  has  minimal  polynomial  f(x)  of  degree  N,  then  K  can  be  considered  as  a  vector  space  over 
Q,  of  dimension  N,  with  basis  {1,0, . . .  ,0^“^}.  Note  that  this  “coefficient  embedding”  is  relative 
to  the  defining  polynomial  f(x)  Equivalently  we  have  K  =  Q[X]/ f(X),  i.e.  the  field  of  rational 
polynomials  with  degree  less  than  N,  modulo  the  polynomial  f{X).  Without  loss  of  generality  we 
can  assume  K,  from  now  on,  is  defined  by  a  monic  irreducible  integral  polynomial  of  degree  N. 
The  ring  of  integers  Ok  of  K  is  defined  to  be  the  subring  of  K  consisting  of  all  elements  whose 
minimal  polynomial  has  integer  coefficients. 

Canonieal  Embedding  There  are  N  field  morphisms  ai  :  K  — >  C  which  fix  every  element  of  Q. 
Such  a  morphism  is  called  a  complex  embedding  and  it  takes  0  to  each  distinct  complex  root  of 
f{X).  The  number  field  K  is  said  to  have  signature  (si,S2)  if  the  defining  polynomial  has  si  real 
roots  and  S2  complex  conjugate  pairs  of  roots;  clearly  N  =  si  +  2  •  S2-  The  roots  are  numbered  in 
the  standard  way  so  that  ai{9)  G  M  for  1  <  i  <  si  and  cjj+si+s2(0)  =  ^i+si{9)  for  1  <  i  <  S2.  We 
define  a  =  (fii, . . .  ,cJAr),  which  defines  the  eanonieal  embedding  of  K  into  x  where  the 

field  operations  in  K  are  mapped  into  componentwise  addition  and  multiplication  in  x 
To  ease  notation  we  will  often  write  =  cJi(a),  for  a  G  AT.  We  will  let  ||a||p  for  p  G  [1, . . .  ,oo] 
denote  the  p-norm  of  a  in  the  coefficient  embedding  (i.e.  the  p-norm  of  the  vector  of  coefficients) 
and  let  ||it(q;)||p  denote  norms  in  the  canonical  embedding. 
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Cyclotomic  Fields  We  will  mainly  be  concerned  with  cyclotomic  number  fields.  The  mth  cyclotomic 
polynomial  is  given  by  this  is  an  irreducible  polynomial  of  degree  N  =  The  number 

field  defined  by  is  said  to  be  a  cyclotomic  number  field,  and  is  defined  hy  K  =  Q(Cm))  where 

Cm  is  an  mth  root  of  unity,  i.e.  a  root  of  <Pm{X).  The  ring  of  integers  of  K  is  equal  to  Z[Cm]-  The 
number  field  K  is  Galois,  and  hence  (importantly  for  us)  the  polynomial  splits  modulo  p  (for  any 
prime  p  not  dividing  m)  into  a  produce  of  distinct  irreducible  polynomials  all  of  the  same  degree. 

The  key  fact  is  that  if  (Prn{X)  has  degree  d  factors  modulo  the  prime  p  then  m  divides  —  1.  To 
see  this  notice  that  if  factors  into  N/d  factors  each  of  degree  d  then  the  finite  field  must 

contain  the  mth  roots  of  unity  and  so  m  divides  p^  —  1.  In  the  other  direction,  if  d  is  the  smallest 
integer  such  that  m  divides  p^  —  1  then  will  have  a  degree  d  factor  since  the  decomposition 

group  of  the  prime  p  in  the  Galois  group  will  have  order  d. 


C.2  Relating  Norms  Between  Canonical  and  Polynomial  Embeddings 

There  is  a  distinct  difference  between  the  canonical  and  polynomial  embeddings  of  a  number  field. 
In  particular  notice  the  following  expansions  upon  multiplication,  for  x,y  £  Ok, 


11^  ■  ?/||oO  ^  doQ  •  ||x||oo  ■  ||y||oO' 

o-{x  ■  y)\\p  <  lk(x)||oo  •  \W{y)\\p- 


where 


5oo  = 


=  sup 


\a{X)-b{X)  (mod /(X))  1C 

||a(X)|U.||6(X)|U 


:  a,  b  £  Z[X],  deg(a),  deg(6)  <  N 


In  this  section  we  show  that  one  can  more  tightly  control  the  expansion  factor  of  elements  in  the 
polynomial  representation;  as  long  as  they  are  drawn  randomly  with  a  discrete  Gaussian  distribu¬ 
tion.  In  particular  we  prove  the  following  theorem;  this  result  is  well  known  to  people  working  in 
ideal  lattice  theory,  but  proofs  have  not  yet  appeared  in  any  paper. 


Theorem  7.  Let  K  denote  a  eyelotomie  number  field  then  there  is  a  eonstant  Cm,  depending  only 
on  m,  sueh  that  for  all  a  £  Ok  we  have 

-  ||fT(a)||oo  <  ||a||i. 

||®||oo  ^  Cm  ■  II <7(0;)  II oo- 


We  recall  some  facts  about  various  matrices  associated  with  roots  of  unity,  see  [27]  and  the  full 
version  of  [22].  First  some  notation;  for  any  integer  m  >  2:  We  set  (m  =  exp(2  •  tt  •  \/^/m)  to  be  a 
root  of  unity  for  an  integer  m.  As  usual  we  let  N  =  (f>{m)  and  we  define  'Em  —  :  0  <  i  <  N} 

to  be  a  complete  set  of  representatives  for  with  1  <  am,i  <  rn.  We  let  A<Si  B,  for  matrices  A 
and  B,  denote  the  Kronecker  product.  We  let  It  denote  the  txt  identity  matrix.  All  a  x  6  matrices 
M  in  this  section  will  have  elements  m^j-  indexed  by  0  <  i  <  a  and  0  <  j  <  6;  i.e.  we  index  from 
zero;  this  is  to  make  some  of  the  expressions  easier  to  write  down.  The  infinity  norm  for  a  matrix 
M  =  (mij)  is  defined  by 


||M||oo  :=  max 


N-l 

E 

j=0 


\m. 


N-l 


i=0 
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We  define  the  N  x  N  CRT  matrix  as  follows: 


CRTm 


0<iJ<N 


Then  we  define  the  constant  Cm  in  the  above  theorem  as  Cm  = 
now  immediately  follows: 


CRT^^Iloo-  From  which  the  proof 


Proof  (Theorem  7).  For  a  cyclotomic  field  the  canonical  embedding  is  given  by  the  map  a{a)  = 
CRT^  •  a,  where  a  is  the  vector  of  the  coefficient  embedding  of  a,  i.e.  a  considered  as  a  polynomial 
in  0  a  root  of  F{X)  =  ^m{X)  and  CRT^  is  the  matrix,  defined  earlier,  i.e.  it  is  equal  to 


/I  ew 

CRTm  = 

= 

0iN) 

QiN)N 

-V 

For  the  first  part  of 

the 

theorem 

we  note  that,  on 

writing  a  = 

■'^6  have 

7V-1 

N-1 

N-l 

E 

Xj  • 

QiPT 

< 

|()<*U|  = 

\Xj\ 

=  X  1  =  a  1. 

i=o 

j=0 

j=0 

For  the  second  part 

we  note 

that 

;  for 

all 

PeOK, 

ll/?l|oo  = 

||CRT-i.a(/3)||oo 

—  II^ft^  IIoo  •  1 

k(/?)lloo 

from  which  the  result  follows. 


□ 


The  key  question  then  is  how  large  can  Cm  become.  So  we  now  turn  to  this  problem;  giving  a 
partial  answer. 


The  m  X  m  DFT  matrix  is  defined  by: 

DFT„:= 

Let  m!  be  a  divisor  of  m  then  for  f  G  {0, . . . ,  m  —  1}  we  write  io  =  i  mod  m!  and  ii  =  {i  —  i^jm! . 
We  then  define  the  m  x  m  “twiddle  matrix”  to  be  the  diagonal  matrix  defined  by 

Diag{C‘‘}..„ . .„_1. 

Finally  we  define  Lff,  to  be  the  permutation  matrix  which  fixes  the  row  with  index  m  —  1,  but  sends 
all  other  rows  i,  for  0  <  i  <  m  —  1,  to  row  i  ■  m'  mod  m  —  1.  Following  [27]  we  use  these  matrices 
to  decompose  the  matrix  Dm  into  D'm  and  where  m  =  m!  ■  k,  via  the  following  identity 

DFTm  =  T™,  •  (4  (g)  DFTm')  •  Tm,m'  ■  (DFTfc  (g)  7^')  ,  (3) 


This  is  nothing  but  the  general  Cooley- Tukey  decomposition  of  the  DFT  for  composite  m.  Consider 
the  Vandermonde  matrix 


It  is  clear  that  DPT^ 


V{xu.. 

•  1  ^m)  •  — 

/I 

1 

Xl 

X2 

xl  . 
xl  ■ 

.. 

•  •  X2 

VI 

Xm 

™2 

rf.m-1 
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Lemma  2.  We  have,  for  any  m, 


DFT“^  =  —  •  Vil  C~^  C~‘^ 
m 

Proof.  Let  Sij  be  defined  so  that  5ij  =  0  if  i  /  j  and  equal  to  one  otherwise.  We  have 

(L(i, u, d, . . . , c“')  -^(1, C, c,  •  •  • , 

—  ri-k  .  A-fc-i 

/  ^  Sn  S>m 

O^/sKm 

=  E  = 

0<k<m 

□ 


This  leads  to  the  following  lemma  which  gives  shows  that  the  infinity  norm  of  the  inverse  of  the 
DFT  matrix  is  always  equal  to  one. 

Lemma  3.  For  any  m  we  have  ||DFT~^||oo  =  1. 

Proof.  If  Cm  is  an  m-th  root  of  unity,  it  is  clear  that  ||L(1  )  Cm,  Cm)  •  •  ■  )  Cm  lloo  =  m.  In  addition 
we  have  V'm  =  1/Cm  is  also  an  m-th  root  of  unity,  thus 

l|DFT-io,  =  -  •  \\V{l,^Pm,^Pl,  .  .  .  ,  V’m“')lloo  =  -  =  1. 

m  m 

□ 


Let  m  =  ■  ■  ■p^‘‘  we  define  r  =  pi  ■  ■  ■  ps,  mi  =  m/r;  hence  N  =  (p{m)  =  (j){r)  ■  mi.  In  [22]  the 

authors  specialise  the  decomposition  (3)  (by  selecting  appropriate  rows  and  columns)  in  the  case 
m'  =  mi  and  k  =  r,  to  show  that  to  show  that,  upto  a  permutation  of  the  rows,  the  matrix  CRT^ 
is  equal  to 

DFTmi)  •  7/n,,mi  '  (CRT^  (8)  Imi) 

where  Tm,mi*  is  another  diagonal  matrix  consisting  of  roots  of  unity.  We  then  have  that 

Lemma  4.  For  an  integer  m  >  2  sueh  that  m  =  pf^  ■  •  we  write  r  =  pi-  ■  -p^,  we  then  have 
Cm  ^  Cr. 

Proof.  As  above  we  write  mi  =  m/r.  First  note  that  ||A  (g)  It\\cxi  =  ||.Is  <8>  ^||oo  =  Halloo  for  any 
matrix  A  and  any  integers  s  and  t.  Then  also  note  that  since  CRT^  is  given,  upto  a  permutation 
of  the  rows,  by  the  above  decomposition,  we  have  that  CRT“^  is  given  up  to  a  permutation  of  the 
rows  by  the  decomposition 

(CRT;1  ®  Imi)  •  T-^  •  ®  DFT-IJ  . 

So  we  have 


IICRT-^lloo 


I  (CRT,-1  ®  V)  •  ®  DFT-IJ  lU, 


<  ||CRT-^®4 

=  l|CRT,-i| 


mi  ||oo 
llm-i  II 


Iloo 


□ 
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This  result  means  that  we  can  bound  Cm  for  infinite  families  of  values  of  m,  by  simply  deducing  a 
bound  on  Cr,  where  r  is  the  product  of  all  primes  dividing  m.  For  example  notice  that  CRT,.  =  (1) 
and  hence  C2<^  =  (72  =  1  for  all  values  of  e.  Indeed  it  is  relatively  straight  forward  to  determine  the 
exact  value  of  Cp  for  a  prime  p: 


Lemma  5.  If  p  is  a  prime  then 


Cp  = 


2  •  sin(7r/p) 
p  ■  (cos(7r/p)  —  1) 


Proof.  ^  First  note  that  it  is  a  standard  fact  from  algebra  (by  consider  inverses  of  Vandermonde 
matrices  for  example)  that  the  entries  of  a  row  of  the  matrix  CRT~^  are  given  by  the  coefficients 
of  the  polynomial 


MX) 

i>'^{Cp)-{x-Cp)’ 


(4) 


where  each  row  uses  a  different  root  of  unity  fp.  We  then  note  that 


=  ((p-0-  (Cp  -  cj)  ■■■(Cp-cr') 

= v"  ■  (1  - « ■  (1  -  ch  ■  ■  ■  (1  -  cr")  ■  Mil 

^  Mp  ^  P 
1  “  1/Cp  Cp  ~  Cp 


Thus  the  coefficients  of  the  polynomial  in  (4)  are  given  by  Cp  •  (Cp  ~^)/p  for  r  =  1,  ■  ■  ■  ,p  —  l.  Where 
each  row  of  our  matrix  is  given  by  a  different  pth  root  Cp- 

Thus  to  determine  the  infinity  norm  of  CRT~^  we  simply  need  to  sum  the  absolute  values  of 
these  coefficients,  for  the  first  row,  since  all  other  rows  will  be  equal: 


"  7  X]  \/2  -  2  •cos(2r7r/p) 

r=l  ^  r=l 


1 

P 


p-1 

2  •  sin(r7r/p) 

r=l 


2  •  sin(7r/p) 
p  ■  (cos(tt/p)  —  1) 


□ 


In  practice  this  result  means  that  (7p  ~  d/yr  ~  1.2732  for  all  p  >  11. 

If  m  is  odd  then  we  see  that,  subject  to  a  permutation  of  the  rows,  the  matrix  CRT2m  and 
CRT„,  are  identical  up  to  a  multiple  of  —1  for  every  second  column.  Thus  we  have 

(72m  =  Cm  for  odd  values  of  m. 

We  find  that  Cr  <  8.6  for  squarefree  r  <  400,  which  provides  a  relatively  small  upper  bound  on 
Cm  for  an  infinite  family  of  cyclotomic  fields  K.  It  appears  that  the  size  of  Cm  depends  crucially  on 
the  number  of  prime  factors  of  m.  Thus  it  is  an  interesting  open  question  to  provide  a  tight  upper 
bound  on  Cm-  Indeed  the  growth  in  Cm  seems  to  be  closely  related  to  the  growth  in  the  coefficients 
of  the  polynomial  ^m(X),  which  also  depends  on  the  number  of  prime  factors  of  m. 


^  This  proof  was  provided  to  us  by  Robin  Chapman  . 
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C.3  Application  of  the  above  bounds 

An  immediate  consequence  of  Theorem  7  is  to  provide  an  upper  bound  on  the  value  5oo  for  cyclo- 
tomic  number  fields.  Let  a  G  Ok  then  we  have,  by  the  standard  inequalities  between  norms,  that 
||«||i  <  A"  •  ||«||oo-  Thus  we  have,  for  a,  (3  G  Ok, 


a  ■  /3||oo  <  Cm  ■  \\cr{a  ■  /3)||oo  <  Cm  ■  ||cr(a)||oo  •  ||o-(/3)||oo 
<  Cm  ■  ||a||i  •  ||/3||i 

<Cm-  ■  ||a||oo  •  ||/3||oo, 


he.  (5oo  <  Cm  ■  A^.  When  m  is  a  power  of  two,  since  Cm  =  1  we  find  the  bound  5oo  < 
however  in  this  case  it  is  known  that  (5oo  =  thus  the  above  bound  is  not  tight. 

A  more  interesting  application,  for  our  purposes,  is  to  bound  the  infinity  norm  in  the  polynomial 
embedding  of  the  product  of  two  elements  which  have  been  selected  with  a  discrete  Gaussian.  To 
demonstrate  this  result  we  will  first  need  to  introduce  the  following  standard  tailbound: 

Lemma  6.  Let  c>  1  and  C  =  c-exp(h:^)  <  1  then  for  any  integer  N  >1  and  real  r  >  0  we  have 


Pr 


|x||2  >  C  ■  S 


N 


<  C 


'N 


Note  that  this  implies  that 


Pr 


-D. 


I.N , a 


X 


>  2  •  r  •  Vn 


2  ^ 


<  2 


-N 


where  r  =  s/y/2  •  tt.  If  we  therefore  select  a,  f3  G  D^n  g,  consider  them  as  elements  of  Ok,  we  then 
have,  with  overwhelming  probability  that  ||q;||2;||/3||2  <  2  ■  r  ■  \fN.  We  then  apply  the  standard 
inequality  between  the  2-  and  the  1-norm  to  deduce  ||a||i,||/3||i<2-r-A.  We  then  have  that 

||a  •  /3||oo  <  Cm  ■  \\cr{oi  ■  /3)||oo  <  Cm  '  ||cr(Q;)||oo  •  ||o-(/3)||oo 

<  Cm  ■  ||o;||i  •  ||/3||i 

<  4  •  Grn  •  rA  Ar2. 


D  Security,  Parameter  Choice  and  Performance 

In  this  Appendix  we  show  that  our  concrete  SHE  scheme  meets  all  the  security  requirements 
required  by  our  MPC  protocol,  i.e.  that  it  is  an  admissible  cryptosystem.  On  the  way  we  derive 
parameter  settings,  and  finally  we  present  some  implementation  results  for  the  core  operations. 
Recall  a  cryptosystem  is  admissible  if  it  meets  the  following  requirements: 

—  It  is  IND-CPA  secure. 

—  It  has  a  KeyGen*  function  with  the  required  properties. 

—  It  is  {Bpiain,  Brand,  C')-COrrect,  where  Bpiain  =  N  -T-  sec2  •  Brand  =  d-  p-  sec2  • 

and  where  C,  the  set  of  functions  we  can  evaluate  on  ciphertexts,  contains  all  formulas  evaluated 
in  the  protocol  Hprep’  (including  the  identity  function).  Note  that  here  we  choose  the  values 
for  Bpiain,  Brand  that  Correspond  to  the  most  efficient  variant  of  the  ZK  proofs. 

Recall  in  the  expressions  for  Bpiain  and  Brand  we  have  d  is  the  dimension  of  the  randomness  space, 
i.e.  d  =  3  •  A,  r  is  a  bound  on  the  infinity  norm  of  valid  plaintexts,  i.e.  p/2;  and  p  is  a  bound  on 
the  infinity  norm  of  the  randomness  in  validly  generated  ciphertexts,  i.e.  p  ~  2  •  r  •  \/A,  by  the 
tailbound  of  Lemma  6. 
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IND-CPA  and  KeyGen*’s  properties:  We  first  turn  to  discussing  security.  Since  our  scheme  is 
identical  (bar  the  distributed  decryption  functionality)  to  that  of  [7],  security  can  be  reduced  to 
the  hardness  of  the  following  problem. 

Definition  2  (PLWE  Assumption).  For  all  sec  G  N,  let  f{X)  =  fsec{X)  G  Z[A]  be  a  polynomial 
of  degree  N  =  N{sec),  let  q  =  q{sec)  G  h  be  a  prime  integer,  let  R  =  Z[A]//(A)  and  R  =  R/qR, 
and  let  x  denote  a  distribution  over  the  ring  R.  The  polynomial  LWE  assumption  states 

that  for  any  I  =  poly{sec)  it  holds  that 

{(oj,  Uj  •  s  +  ej)}jg|'j]  ~  ^i)}ie[i] 

where  s  is  sampled  from  the  distribution  x?  and  ai,Ui  are  uniformly  random  in  Rg.  We  require 
eomputational  indistinguishability  to  hold  given  only  I  samples,  for  some  I  =  poly{sec) . 

In  particular  our  scheme  is  semantically  secure  if  the  PLWE^^^j^)  Div^^^-problem  is  hard.  The 
hardness  of  the  same  problem  also  implies  that  the  output  from  KeyGen()  is  computationally 
indistinguishable  from  that  of  KeyGen*(). 

Thus  our  first  task  is  to  derive  relationships  between  the  parameters  so  as  to  ensure  the  first 
two  properties  of  being  admissible  are  satisfied,  i.e.  the  PLWE  problem  is  actually  hard  to  solve. 
The  basic  parameters  of  our  scheme  are  the  degree  of  the  associated  number  field  N  = 
the  standard  deviation  r  of  the  used  Gaussian  distribution,  and  the  modulus  q.  We  first  turn  to 
estimating  r;  we  do  this  by  using  the  “standard”  analysis  of  the  underlying  LWE  problem. 

We  first  ensure  that  r  is  chosen  to  avoid  combinatorial  style  attacks.  Consider  the  underlying 
LWE  problem  as  being  given  by  s  •  A  +  e  =  v,  where  e  is  the  LWE  error  vector,  and  A  is  a  random 
N  X  t  matrix  over  ¥q.  In  [1]  the  authors  present  a  combinatorial  attack  which  breaks  LWE  in  time 
2‘^(ll‘^ll?o)  with  high  probability.  Since  e  is  chosen  by  the  discrete  Gaussian  with  standard  deviation 
r,  if  we  pick  r  large  enough  then  this  attack  should  be  prevented.  Thus  choosing  r  such  that  r  >  3.2 
will  ensure  that  r  is  large  enough  to  avoid  combinatorial  attacks,  i.e.  s  >  8. 

We  now  turn  to  the  distinguishing  problem,  namely  given  v  can  we  determine  whether  it  arises 
from  an  LWE  sample,  or  from  a  uniform  sample.  We  determine  a  lower  bound  on  N.  The  natural 
“attack”  against  the  decision  LWE  problem  is  to  first  find  a  short  vector  w  in  the  dual  lattice 
Aq{A^)*  and  then  check  whether  w  •  v"'"  is  close  to  an  integer.  If  it  is  then  the  input  vector  is  an 
LWE  sample,  if  not  it  is  random.  Thus  to  ensure  security,  following  the  argument  in  [23]  [Section 
5.4.1],  we  require 


Eollowing  the  work  of  [14]  we  can  estimate,  for  t  ^  N,  the  size  of  the  output  of  a  lattice  reduction 
algorithm  operating  on  the  lattice  Aq{A^)*.  In  particular  if  the  algorithm  tries  to  find  a  vector 
with  root  Hermite  factor  d  (thus  6  measures  the  difficulty  in  breaking  the  underlying  SHE  system, 
typically  one  may  select  6  ~  1.005,  but  see  later  for  other  choices)  then  we  expect  to  find  a  vector 
w  of  size 

-  min(g,  (5*  • 

Eollowing  the  analysis  of  [23]  the  above  quantity  is  minimized  when  we  select  t  =  t'  :=  N  log(g') /  log((5) . 
This  leads  us  to  the  deduce  the  lower  bound 

r  >  1.5  •  max(l  ,  5~^  •  ). 
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Noise  of  a  Clean  Ciphertext:  We  now  turn  to  determining  the  bound,  in  the  infinity  norm, 
of  the  value  obtained  in  decrypting  valid  ciphertexts.  Consider  what  happens  when  we  decrypt  a 
clean  ciphertext,  encrypted  via  (co,ci)  =  EnCpk(x,  r),  with  r  =  (u,v,w).  This  looks  like  a  PLWE 
sample  (ci,co)  where  the  “noise”  term,  for  a  validly  generated  clean  ciphertext,  is  given  by 

t  =  Cq  —  S  • 

=  x  +  p-(e-v  +  w  +  s-u) 

By  our  estimates  in  Appendix  C.3  we  can  bound  the  infinity  norm  of  t  by 

||t||oo  <^+p  -  (4-Cm-r^  ■N^  +  2-^-r  +  4-Cm-r^-N^)  =:Y. 


{Bpiain,  Brandi  C')-correctness:  Whilst  IND-CPA  is  about  security  in  relation  to  validly  created 
ciphertexts,  our  distributed  decryption  functionality  must  be  secure  even  when  some  ciphertexts 
are  not  completely  valid.  This  was  why  we  introduced  the  notion  of  {Bpiain,  Brand,  C)-coiiectness. 
We  need  to  pick  Bpiain  and  Brand  so  that  Bpiain  >  N  •  r  •  sec^  •  and  Brand  >  d-p-sec^ 

Since  Bpiain  ^  Brand  we  estimate  the  noise  term  associated  to  such  a  “clean”  ciphertext  will  be 
bounded  by  Y'  =  {Brand/ p)^  ■  Y  =  9  ■  ■  sec"^  .2sec+i6  .  jyipc  protocol  we  only  need  to 

be  able  to  evaluate  functions  of  the  form 


(xi  H - h  Xn)  •  (yi  H - h  2/n)  +  (^^1  H - V  Zn). 


We  can,  via  the  results  in  Appendix  C.3,  crudely  estimate  the  size  of  B,  from  Section  6,  needed  to 
ensure  valid  decryption.  Our  crude  (over-)  estimate  therefore  comes  out  as 

B  <5oo-{n-Y')-{n- Y')  +  (n  •  Y') 

<C„,-  -r?  ■Y''^  +  n-Y' 

<Cm-NYn^.  4^  -Y^  +  n-  Csec  •  T  =:  Z 

where  Csec  =  9  •  N'^  •  sec^  •  We  take  Z  as  the  bound,  which  we  then  need  to  scale  by  1  -|-  2^®*^ 

to  ensure  we  have  sufficient  space  to  enable  the  distributed  decryption  algorithm.  Hence,  the  value 
of  q  needs  to  be  selected  so  that  Z  •  (1  -|-  2^^^^)  <  q/2. 


So  in  summary  we  need  to  choose  parameters  such  that 

g>2-Z-(l  +  2=""), 
r  >  max  |3.2, 1.5  •  |  , 

where  sec  is  the  statistical  security  parameter,  (5  is  a  measure  of  how  hard  it  is  to  break  the 
underlying  SHE  scheme,  and  t'  =  Y^ATog(g)7log(^.  This  leads  to  a  degree  of  circularity  in  the 
dependency  of  the  parameters,  but  valid  parameter  sets  can  be  found  by  a  simple  search  technique. 
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Specific  Parameter  Sets:  To  determine  parameters  for  fixed  values  of  and  n  we  proceed 

as  follows.  There  are  two  interesting  cases;  one  where  p  is  fixed  (i.e.  p  =  2)  and  one  where  we  only 
care  that  p  is  larger  than  some  bound  (i.e.  p  >  2^^,  or  p  >  2®^).  The  latter  case  of  p  >  2®^  is  more 
interesting  as  such  size  numbers  can  be  utilized  more  readily  in  applications  since  we  can  simulate 
integer  arithmetic  without  overflow  with  such  numbers.  In  addition  using  such  a  value  of  p  means 
we  do  not  need  to  repeat  our  ZKPoKs,  or  replicate  the  MACs  so  as  to  get  a  cheating  probability 
of  less  than  2“^®. 

Our  method  in  all  cases  is  to  first  fix  p,  n,  sec  and  5,  we  then  search  using  the  above  inequalities 
for  (rough)  values  of  q  and  N  which  satisfy  the  inequalities  above.  We  then  search  for  exact  values 
of  p  and  N  which  satisfy  our  functional  requirements  on  p  (i.e.  fixed  or  greater  than  some  bound) 
plus  N  larger  than  the  bound  above,  such  that  N  is  the  degree  of  F{X)  =  <Prn{X)  and  F{X)  splits 
into  at  least  s  factors  of  degree  divisible  by  k  over  Fp. 

Given  this  precise  value  for  N,  we  then  return  to  the  above  inequalities  to  find  exact  values  of 
q  and  r.  In  all  our  examples  below  we  pick  n  =  3,  sec  =  40,  and  6  =  1.0052. 


Example  1.  We  first  look  at  p  >  2®^.  Our  first  (approximate)  search  reveals  we  need  N  >  14300, 
q  Ki  2^®®  and  r  =  3.2  (assuming  Cm  <  2).  We  then  try  to  find  an  optimal  value  of  N]  this  is  done 
by  taking  increasing  primes  p  >  2®^  and  factoring  p  —  1.  The  factors  of  p  —  1  correspond  to  values 
of  m  such  that  Fm{X)  factors  into  4>{m)  factors  modulo  p.  So  we  want  to  find  a  p  such  that  p  —  1 
is  divisible  by  an  m,  so  that  N  =  (/>(m)  >  14300.  A  quick  search  reveals  candidates  of 

{p,N,m)  =  (2®2  +  32043,  14656,  14657). 

Picking  m  in  this  way  will  maximise  the  value  of  s  =  n,  and  hence  allow  us  to  perform  more 
operations  in  parallel.  In  addition  since  m  is  prime  we  know,  by  Lemma  5,  that  Cm  ~  1.2732,  thus 
justifying  our  assumption  in  deriving  the  bounds  of  Cm  <  2. 

Selecting  m  to  be  the  prime  14657  in  addition  allows  us  to  evaluate  s  =  p— 1  =  14656  runs  of  the 
triple  production  algorithm  in  parallel.  The  message  expansion  factor,  given  we  require  N  ■  log2(9) 
bits  to  represent  N  elements  in  Fp  is  given  by 


N  •  log2(g) 
N  ■  log2(p) 


log2(g) 

log2(p) 


13.437. 


Example  2.  Performing  the  same  analysis  for  a  p  >  2®^,  our  first  naive  search  of  parameters  reveal 
we  need  an  n  ~  16700  and  q  ~  2®®®.  We  then  search  for  specific  parameters  and  find  p  =  2®^  +  4867 
is  pretty  near  to  optimum,  which  results  in  a  prime  value  of  m  of  16729.  We  find  the  expansion 
factor  is  given  by 

log2(g)  -  7  SI 

log2(p)  64  ^  ■ 

Example  3.  We  now  look  at  the  case  p  =  2  and  k  =  8,  i.e.  we  are  looking  for  parameters  which 
would  allow  us  to  compute  AES  circuits  in  parallel;  or  more  generally  circuits  over  F28.  Our  first 
approximate  search  reveals  that  we  need  N  >  12300,  q  ~  2®^®  and  r  =  3.2.  So  we  now  need  to 
determine  a  value  m  such  that 


N  =  (j){m)  >  12100  and  2^^  =  1  (mod  m)  and  d  =  0  (mod  8). 
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A  quick  search  reveals  candidates  of 


{m,N)  =  (17425,12800) 

since  <^17425 (A)  factors  into  s  =  320  factors  of  degree  d  =  40  modulo  2.  Thus  using  this  value  of  m 
we  are  able  to  work  with  s  =  320  elements  of  F28  in  parallel.  The  message  expansion  factor,  given 
we  require  N  ■  log2(9)  bits  to  represent  320  elements  in  F28  is  given  by 

1850.0 

8  •  s  8 

For  this  value  of  m  we  find  (717425  ~  9.414. 

We  present  the  following  run-times  we  have  achieved.  We  time  the  operations  for  encrypting 
and  decrypting  clean  ciphertexts,  the  time  to  homomorphically  compute  {cx^Cy)'^Cz,  plus  the  time 
to  decrypt  the  said  result.  The  times  are  given  in  seconds,  and  in  brackets  we  present  the  amortized 
time  per  finite  field  element.  All  timings  were  performed  on  an  Intel  Core-2  6420  running  at  2.13 
GHz. 


Example 

Enc 

Time  (s) 

Dec  (Clean) 
Time  (s) 

{Cx  Kl  Cy)  ffl  Cz 

Time  (s) 

Decsk((cx  ^  Cy)  ffl  Cz) 
Time  (s) 

1 

0.72  {0.00005} 

0.35  {0.00002} 

1.43  {0.0001} 

0.72  {0.00005} 

2 

3.13  {0.00019} 

1.54  {0.00009} 

6.27  {0.0004} 

3.15  {0.00018} 

3 

1.26  {0.00394} 

0.60  {0.00188} 

2.46  {0.0077} 

1.23  {0.00384} 

Estimating  Equivalent  Symmetric  Security  Level:  The  above  examples  were  computed  using 
the  root  Hermite  factor  of  d  =  1.005.  Mapping  this  “hardness”  parameter  for  the  underlying  lattice 
problem  to  a  specific  symmetric  security  level  (i.e.  80-bit  security,  or  128-bit  security)  is  a  bit  of  a 
“black  art”  at  present. 

In  [9]  the  authors  derive  an  estimate  for  the  block  size  needed  to  obtain  a  given  root  Hermite 
factor,  assuming  an  efficient  BKZ  lattice  reduction  algorithm  is  used.  They  then  provide  estimates 
as  to  the  run  time  needed  for  a  specific  enumeration  using  this  block  size.  As  an  example  of  their 
analysis  they  estimate  that  a  block  size  of  286  is  needed  to  obtain  a  root  Hermite  factor  of  d  =  1.005. 
Then  they  estimate  that  the  run  time  needed  to  perform  the  enumeration  in  a  projected  lattice  of 
such  dimension  (the  key  sub-procedure  of  the  BKZ  algorithm)  takes  time  roughly  between  2®*^  and 
2^^^  operations.  Thus  a  value  of  <5  =  1.005  can  be  considered  secure;  however  their  estimates  are 
not  precise  enough  to  produce  parameters  associated  with  a  given  symmetric  security  level. 

In  [20]  the  authors  take  a  different  approach  and  simply  extrapolate  run  times  for  the  NTL 
implementation  of  BKZ.  By  looking  at  various  LWE  instances,  they  derive  the  following  equation 
linking  the  expected  run-time  of  a  distinguishing  attack  and  the  root  Hermite  factor 


log2T' 


1.8 

log2(j 


-  no. 


The  problem  with  this  approach  is  that  NTL’s  implementation  of  BKZ  is  very  old,  and  hence  is 
not  state-of-the-art;  on  the  other  hand  we  are  able  to  derive  a  direct  linkage  between  5  and  log2  T. 
Using  this  equation  we  find  the  following  equivalences: 
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logaT 

80 

100 

128 

196 

256 

5 

1.0066 

1.0059 

1.0052 

1.0041 

1.0034 

Using  these  estimates  for  6  we  re-run  the  above  analysis  to  find  approximate  values  for  N  and 
q  in  our  three  example  applications;  again  assuming  n  =  3  and  sec  =  40. 


Fp  : 

p  >  2^2 

Fp  : 

p  >  2^4 

F28 

N  > 

log2  q  ^ 

N  > 

log2  q  ~ 

N  > 

log2  q  ~ 

(5  =  1.0066 

11300 

430 

12900 

490 

9500 

360 

S  =  1.0059 

12600 

430 

14700 

500 

10900 

370 

S  =  1.0052 

14300 

430 

16700 

500 

12300 

370 

S  =  1.0041 

18600 

440 

21100 

500 

15600 

370 

(5  =  1.0034 

22400 

440 

25500 

500 

18800 

370 

As  can  be  seen  the  security  parameter  has  only  marginal  impact  on  log2  q,  and  results  in  a 
doubling  of  the  size  of  N  as  we  increase  from  a  security  level  of  80  bits  to  256  bits.  As  a  comparison 
if  we,  for  security  level  128  bits,  i.e.  5  =  1.0052,  increase  the  value  of  sec  from  40  to  80  we  find  the 
following  parameter  sizes: 


Fp  :  p  >  2^2 

Fp  :  p  >  2^4 

F28 

N  > 

log2  q  ^ 

N  > 

log2  q  ~ 

N  > 

log2  q  ~ 

(5  =  1.0052 

18700 

560 

21000 

630 

16700 

500 

E  Functionalities 


Functionality  JFrand 

Random  Sample:  When  receiving  (rand)  from  all  parties,  it  samples  a  uniform  r  ^  {0, 1}“  and  outputs 
(rand,r)  to  all  parties. 

Random  modulo  p:  When  receiving  {rand,p)  from  all  parties,  it  samples  a  uniform  value  e  <—  F^ic  and  outputs 
{rand,e)  to  all  parties. 

Fig.  14.  The  ideal  functionality  for  coin-flipping. 
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Functionality  JFampc 

Initialize:  On  inpnt  {init,p)  from  all  parties,  the  functionality  activates  and  stores  the  modulus  p. 

Rand:  On  input  {rand,  Pi,  varid)  from  all  parties  Pi,  with  varid  a  fresh  identifier,  the  functionality  picks  r  ^ 
and  stores  {varid,  r). 

Input:  On  input  {input.  Pi,  varid,  x)  from  Pi  and  {input.  Pi,  varid,?)  from  all  other  parties,  with  varid  a  fresh 
identifier,  the  functionality  stores  {varid, x). 

Add:  On  command  {add,  varidi,  varid2,  varids)  from  all  parties  (if  varidi,  varid2  are  present  in  memory  and 
varids  is  not),  the  functionality  retrieves  {varidi,x),  {varid2,y)  and  stores  {varid3,x  +  y  mod  p). 

Multiply:  On  input  {multiply ,varid\,varid2,varido,)  from  all  parties  (if  varidi,  varid2  are  present  in  memory 
and  varids  is  not),  the  functionality  retrieves  {varidi,  x),  {varid2,y)  and  stores  {varid^jX  ■  y  mod  p). 

Output:  On  input  {output,  varid)  from  all  honest  parties  (if  varid  is  present  in  memory),  the  functionality 
retrieves  {varid,  x)  and  outputs  it  to  the  environment.  If  the  environment  inputs  OK  then  x  is  output  to  all 
players.  Otherwise  _L  is  output  to  all  players. 

Fig.  15.  The  ideal  functionality  for  arithmetic  MFC. 
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Functionality  JFprep 

Usage:  We  first  describe  two  macros,  one  to  produce  [v]  representations  and  one  to  produce  (v)  representations. 

We  denote  by  A  the  set  of  players  controlled  by  the  adversary. 

Bracket(vi, . . . ,  v„,  Zli, . . . ,  Z\„,  /3i, . . .  ,/3„),  where  vi, . . . ,  v„,  Zii, . . . ,  Z\„  e  /3i, . . .  ,/3„  € 

1.  Let  V  =  Vj 

2.  For  i  =  1, . . . ,  n 

(a)  The  functionality  computes  the  MAC  7(v)i  ^  v  •  and  sets  7i  ^  7(v)i  +  Ai 

(b)  For  every  corrupt  player  Pj,  j  G  A  the  environment  specifies  a  share  7^ 

(c)  The  functionality  sets  each  share  7^ ,  J  ^  A,  uniformly  such  that  X]j=i  7i  ~  7* 

3.  The  functionality  sends  (vi,  (/?i,  7];, . . . ,  7^))  to  each  honest  player  Pi  (dishonest  players  already  have 
the  respective  data). 

Angle(vi,...,v„,Zi,a),  where  vi, . . . ,  v„,  Z\  e  (F^fc)®,  a  G  F^^ 

1.  Let  V  =  JD'Li  Vi 

2.  The  functionality  computes  the  MAC  7(v)  <—  a  •  v  and  sets  7  <—  7(v)  +  A 

3.  For  every  corrupt  player  Pi,  i  A  the  environment  specifies  a  share  7; 

4.  The  functionality  sets  each  share  'yi  i  ^  A  uniformly  such  that  X]r=i  7»  =  7 

5.  The  functionality  sends  (0,  Vi,  7^)  to  each  honest  player  Pi  (dishonest  players  already  have  the  respective 
data). 

Initialize:  On  input  {init,p,  k,  s)  from  all  players,  the  functionality  stores  the  prime  p  and  the  integers  k,  s.  It 
then  waits  for  the  environment  to  call  either  “stop”  or  “OK” .  In  the  first  case  the  functionality  sends  “fail” 
to  all  honest  players  and  stops.  In  the  second  case  it  does  the  following: 

1.  For  each  corrupt  player  Pi,  i  G  A,  the  environment  specifies  a  share  ai 

2.  The  functionality  sets  each  share  ai,  i  ^  A  uniformly 

3.  For  each  corrupt  player  Pi,  i  G  A,  the  environment  specifies  a  key  Pi 

4.  The  functionality  sets  each  key  Pi  i  ^  A  uniformly 

5.  The  environment  specifies  Ai, . . . ,  A„  G  (F^fc)® 

6.  It  runs  the  macro  Bracket(Diag(Q;i), . . . ,  Diag(an),  Ai, . . . ,  An,  Pi, . . . ,  P„). 

Pair:  On  input  (pair)  from  all  players,  the  functionality  waits  for  the  environment  to  call  either  “stop”  or  “OK” . 
In  the  first  case  the  functionality  sends  “fail”  to  all  honest  players  and  stops.  In  the  second  case  it  does  the 
following: 

1.  For  each  corrupt  player  Pi,  i  G  A,  the  environment  specifies  a  share  r; 

2.  The  functionality  sets  each  share  Vi,  i  ^  A  uniformly 

3.  The  environment  specifies  A,  Ai, . . . ,  An  G  (Fpfc)® 

4.  It  runs  the  macros  Bracket(ri, . . . ,  r„,  Z\i, . . . ,  Zi„,  Pi, . . . ,  Pn)  and  Angle(ri, . . . ,  r„.  A,  a). 

Triple:  On  input  (triple)  from  all  players,  the  functionality  waits  for  the  environment  to  call  either  “stop”  or 
“OK”.  In  the  first  case  the  functionality  sends  “fail”  to  all  honest  players  and  stops.  In  the  second  case  it 
does  the  following 

1.  For  each  corrupt  player  Pi,  i  G  A,  the  environment  specifies  shares  ai,  bi 

2.  The  functionality  sets  each  share  ai,  bi,  i  ^  A  uniformly.  Let  a  :=  X]r=i  b  :=  Y17=i  bi 

3.  The  environment  specifies  Aa,Ai,,S  G  (F^ic)'* 

4.  It  sets  c  ^  a  •  b  +  5 

5.  For  each  corrupt  player  Pi,  i  G  A,  the  environment  specifies  shares  Ci 

6.  The  functionality  sets  each  share  Ci,  i  ^  A  uniformly  with  the  constrain  'Yl'i=i  Ci  =  c 

7.  The  environment  specifies  Z\c  €  (Fpfc)® 

8.  It  runs  the  macros  Angle(ai, . . . ,  a„,  Z\a,  a),  Angle(bi, . . .  ,b„,  Ab,  a),  Angle(ci, . . . ,  c„,  Ac,  a). 

Fig.  16.  The  ideal  functionality  for  making  the  global  key  |a],  pairs  |r|,  (r)  and  triples  (a),  (b),  (c) 
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Abstract.  We  describe  an  implementation  of  the  protocol  of  Damgard,  Pastro,  Smart  and  Zakarias 
(SPDZ/Speedz)  for  multi-party  computation  in  the  presence  of  a  dishonest  majority  of  active  adver¬ 
saries.  We  present  a  number  of  modifications  to  the  protocol;  the  first  reduces  the  security  to  covert 
security,  but  produces  signihcant  performance  enhancements;  the  second  enables  us  to  perform  bit-wise 
operations  in  characteristic  two  fields.  As  a  bench  mark  application  we  present  the  evaluation  of  the 
AES  cipher,  a  now  standard  bench  marking  example  for  multi-party  computation.  We  need  examine 
two  different  implementation  techniques,  which  are  distinct  from  prior  MPC  work  in  this  area  due  to 
the  use  of  MACs  within  the  SPDZ  protocol.  We  then  examine  two  implementation  choices  for  the  hnite 
fields;  one  based  on  finite  helds  of  size  2®  and  one  based  on  embedding  the  AES  field  into  a  larger  finite 
field  of  size  2^°. 


1  Introduction 

The  invention  of  secure  multi-party  computation  is  one  of  the  crowning  achievements  of  theoretical  cryptog¬ 
raphy,  yet  despite  being  invented  around  twenty-five  years  ago  it  has  only  recently  been  implemented  and 
tested  in  practice.  In  the  last  few  years  a  number  of  MPC  “systems”  have  appeared  [4,7-9,12,15,22],  as 
well  as  experimental  research  results  [13, 16,  21,  25,  26]. 

The  work  (both  theoretical  and  practical)  can  be  essentially  divided  into  two  camps.  On  one  side  we  have 
techniques  based  on  Yao  circuits  [28],  which  are  mainly  focused  on  two  party  computations,  and  on  the  other 
we  have  techniques  based  on  secret  sharing  [6, 11],  which  can  be  applied  to  more  general  numbers  of  players. 
This  is  rather  a  coarse  divide  as  some  techniques,  such  as  that  from  [25],  only  apply  in  the  two  party  case 
but  it  is  based  on  secret  sharing  as  opposed  to  Yao  circuits.  Following  this  coarse  divide  we  can  then  divide 
work  into  those  which  consider  only  honest-but-curious  adversaries  and  those  which  consider  more  general 
active  adversaries. 

As  in  theory,  it  turns  out  that  in  practice  obtaining  active  security  is  a  much  more  challenging  task; 
requiring  more  computational  and  communication  resources.  All  prior  implementation  reports  to  our  knowl¬ 
edge  for  active  adversaries  have  either  been  in  the  two  party  setting,  or  have  restricted  themselves  to  the 
multi-party  setting  with  honest  majority.  In  the  two  party  setting  one  can  adopt  specialist  protocols,  such 
as  those  based  on  Yao  circuits,  whilst  the  restriction  to  honest  majority  in  the  multi-party  setting  means 
that  cheaper  information  theoretic  constructions  can  be  employed.  Recently,  Damgard  et  al  [14]  following  on 
from  work  in  [5],  presented  an  actively  secure  protocol  (dubbed  “SPDZ”  and  pronounced  “Speedz”)  in  the 
multi-party  setting  which  is  secure  in  the  presence  of  dishonest  majority.  The  paper  [14]  contains  some  simple 
implementation  results,  and  extrapolated  estimates,  but  it  does  not  report  on  a  fully  working  implementation 
which  computes  a  specific  function. 
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Whilst  active  security  is  the  “gold  standard”  of  security,  many  applications  can  accept  a  weaker  notion 
called  covert  security  [1, 2].  In  this  model  a  dishonest  party  deviating  from  the  protocol  will  be  detected  with 
high  probability;  as  opposed  to  the  overwhelming  probability  required  by  active  security.  Due  to  the  weaker 
requirements,  covert  security  can  often  be  achieved  for  less  computational  effort. 

Our  Contribution.  As  already  remarked  much  progress  has  been  made  on  implementation  of  MFC  protocols 
in  the  last  few  years,  but  most  of  the  “fast”  implementations  have  been  for  simpler  security  models.  For  ex¬ 
ample  prior  work  has  focused  on  protocols  for  two  party  computation  only,  or  honest-but-curious  adversaries 
only,  or  for  threshold  adversaries  only.  In  this  work  we  extend  the  prior  implementation  work  to  the  most 
complex  setting  namely  covert  and  active  security  against  a  dishonest  majority.  In  addition  we  examine  more 
than  four  players;  with  some  experiments  being  carried  out  with  ten  players.  Thus  our  work  shows  that  even 
such  stringent  security  requirements  and  parameter  settings  are  beginning  to  be  within  reach  of  practical 
application  of  MFC  technology. 

More  concretely,  we  show  how  to  simplify  the  SFDZ  protocol  so  that  it  achieves  covert  security  for 
a  greatly  improved  computational  performance,  we  present  the  first  implementation  results  for  the  SFDZ 
protocol  (in  both  the  active  and  covert  cases),  and  we  describe  an  evaluation  of  the  AES  functionality  with 
this  protocol.  Our  protocol  implementation  is  in  the  random  oracle  model,  specifically  the  zero-knowledge 
proofs  required  by  SFDZ  are  implemented  using  the  Fiat-Shamir  heuristic.  We  also  simplify  some  other 
parts  of  the  SFDZ  protocol  in  the  random  oracle  model  (details  are  provided  below),  and  present  extensions 
to  enable  bit-wise  operations  in  characteristic  two  fields. 

Since  the  work  of  [26]  it  has  become  common  to  measure  the  performance  of  an  MFC  protocol  with  the 
time  it  takes  to  evaluate  the  AES  functionality.  This  is  for  a  number  of  reasons:  Firstly  AES  provides  a 
well  understood  function  which  is  designed  to  be  highly  non-linear,  secondly  AES  has  a  regular  and  highly 
mathematical  structure  which  allows  one  to  investigate  various  different  optimization  techniques  in  a  single 
function,  and  thirdly  “oblivious”  evaluation  of  AES  on  its  own  is  an  interesting  application  which  if  one 
could  make  it  fast  enough  could  have  practical  application. 

The  paper  is  structured  as  follows.  We  start  by  covering  details  of  prior  work  on  using  MFC  to  implement 
AES.  In  Section  3  we  detail  the  basics  of  the  SFDZ  protocol  and  the  minor  changes  we  made  to  the  pre¬ 
sentation  in  [14].  Then  in  Section  4  we  describe  how  we  implemented  the  S-Box,  this  is  the  only  non-linear 
component  in  AES  and  so  it  is  the  only  part  which  requires  interaction.  Finally  in  Section  5  we  present  our 
implementation  results. 


2  Prior  Work  on  Evaluating  AES  via  MPC  Protocols 

As  noted  earlier  the  first  MFC  evaluation  of  the  AES  functionality  was  presented  in  [26] .  This  paper  presented 
a  protocol  for  the  case  of  two  parties,  using  Yao  circuits  as  the  basic  building  block.  On  their  own  Yao  circuits 
only  provide  security  against  semi-honest  adversaries,  and  in  this  case  the  authors  obtained  a  run-time  of 
7  seconds  to  evaluate  a  single  AES  block  (the  model  being  that  party  A  holds  the  key,  and  party  B  holds 
a  message,  with  B  wishing  to  obtain  the  encryption  of  their  message  under  A’s  key).  To  obtain  security 
against  active  adversaries  a  variant  of  the  cut-and-choose  methodology  of  Lindell  and  Finkas  [20]  was  used, 
this  resulted  in  the  run-time  dropping  to  19  minutes  to  evaluate  an  AES  encryption. 

In  [15]  Henecka  et  al  again  look  at  two-party  computation  based  on  Yao  circuits,  but  restrict  to  the  case 
of  semi-honest  adversaries  only.  They  reduce  the  run  time  per  block  from  the  previous  7  seconds  down  to  3.3 
seconds.  Huang  et  al  [16]  improve  this  even  further  obtaining  a  time  of  0.2  seconds  per  block  for  semi-honest 
adversaries. 

In  [25]  the  authors  present  a  two  party  protocol,  but  instead  of  their  protocol  being  based  on  Yao  circuits 
they  instead  base  it  on  OT  extension  in  the  Random  Oracle  Model,  and  a  form  of  “secret  sharing  with  MAGs” 
(similar  to  the  SFDZ  protocol  which  we  examine  below).  This  enables  the  authors  to  obtain  active  security 
and  to  improve  on  the  prior  performance  of  other  implementations.  The  run  time  for  a  single  evaluation  of 
the  AES  circuit  is  64  seconds,  however  this  drops  to  around  2.5  seconds  when  amortized  over  a  number  of 
encryption  blocks. 
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The  most  recent  result  in  the  two  party  setting  is  [17],  which  returns  to  using  Yao  circuit  based  protocols. 
By  use  of  clever  engineering  of  the  overall  run-time  design  the  authors  are  able  to  significantly  improve  the 
execution  time  for  a  single  AES  evaluation  down  to  Is  in  the  case  of  active  adversaries. 

Moving  to  the  case  of  more  than  two  players,  all  prior  implementation  results  have  either  been  for  three 
or  four  players;  and  have  been  in  the  semi-honest  setting  for  the  case  of  three  players.  Like  our  work,  in  this 
setting  one  utilizes  secret  sharing  but  prior  work  has  been  based  on  Shamir  secret  sharing,  or  specialised 
protocols;  and  in  the  case  of  active  security  has  been  based  on  Verifiable  Secret  Sharing. 

The  main  paper  which  is  related  to  our  work  is  that  of  [13],  so  we  now  spend  some  time  to  explain  the 
differences  between  our  approach  and  that  of  [13].  In  [13]  the  authors  examine  an  AES  implementation  in 
the  case  of  standard  threshold-secret-sharing  based  MFC  protocols.  An  implementation  for  one  semi-honest 
adversary  amongst  three  players  and  one  active  adversary  amongst  four  players  is  described  using  the  VIFF 
framework  [12].  The  VIFF  framework  works  much  like  the  SPDZ  protocol,  in  that  it  utilizes  Beaver’s  [3] 
method  for  MFC  evaluation.  In  an  Offline  Phase  “multiplication  triples”  are  produced,  and  then  in  an  Online 
Phase  the  function  specific  calculation  is  performed.  The  two  key  differences  between  the  protocol  in  [13] 
and  the  use  of  SPDZ  is  that  the  method  to  produce  the  triples  is  different,  and  the  method  to  ensure  non¬ 
cheating  adversaries  during  the  evaluation  of  the  circuit  is  also  different.  These  differences  are  induced  since 
[13]  is  interested  in  threshold  adversary  structures,  whereas  we  are  interested  in  the  more  challenging  case 
of  dishonest  majority. 

The  protocol  of  [13]  is  however  similar  to  our  work  in  that  it  looks  at  the  AES  circuit  as  a  circuit  over 
the  finite  field  F2S,  and  not  as  an  arbitrary  binary  circuit.  The  S-Box  in  AES  is  (usually)  composed  of  two 
operations  an  inversion  in  the  field  F2S  followed  by  a  linear  operation  on  the  bits  of  the  resulting  element.  In 
[13]  the  authors  discuss  various  techniques  for  computing  the  inversion,  and  for  the  bitwise  linear  operation 
they  utilize  a  trick  of  bit-decomposition  of  the  shared  value.  This  bit-decomposition  is  itself  implemented 
using  the  technique  of  pseudorandom  secret  sharing  (PRSS)  of  bits. 

For  MFC  protocols  based  on  Shamir  secret  sharing,  obtaining  a  PRSS  is  relatively  straight  forward,  indeed 
it  is  a  local  operation  assuming  some  set-up.  However,  for  protocols  using  secret  sharing  with  MACs  (as  in 
our  approach)  it  is  unknown  how  to  build  a  PRSS  in  such  a  clean  way.  Thus  we  produce  such  shared  random 
bits  by  executing  another  stage  in  the  Offline  Phase  of  the  SPDZ  protocol.  We  also  present  a  simplification 
of  the  technique  in  [13]  to  use  such  bit-decompositions  to  implement  the  S-Box.  This  approach  does  however 
assume  that  the  Offline  Phase  somehow  “knows”  that  the  computed  function  will  required  shared  random 
bits;  which  defeats  the  point  of  having  a  function  independent  Offline  stage  and  also  adds  to  the  run  time  of 
the  Offline  stage.  Thus  we  also  present  a  distinct  approach  which  utilizes  a  surprising  algebraic  formulation 
of  the  S-Box. 

The  implementation  of  [13]  required  less  than  2  seconds  per  AES  block  (including  key  expansion)  when 
computing  with  three  players  and  at  most  one  semi-honest  adversary,  and  less  than  7  seconds  per  AES  block 
when  computing  with  four  players  and  at  most  one  active  adversary.  These  times  include  the  time  for  the 
Offline  Phase.  If  one  is  only  interested  in  the  Online  Phase  times,  then  the  active  adversary  case  can  be 
executed  in  between  three  and  four  seconds  per  AES  block. 

More  recent  work  has  focused  on  the  case  of  semi-honest  adversaries  and  three  players  only.  Two  recent 
results  [18, 19]  have  used  an  additive  secret  sharing  scheme  and  a  novel  multiplication  protocol  to  perform 
semi-honest  three  party  MPC  in  the  presence  of  at  most  one  adversary.  In  [18]  the  authors  present  an  AES 
implementation  using  a  novel  implementation  of  the  S-Box  component  via  an  MPC  table-lookup  procedure. 
They  report  being  able  to  perform  67  AES  block  cipher  evaluations  per  second.  In  [19]  the  authors  report 
on  an  implementation  of  AES,  using  the  Sharemind  framework  [7],  in  which  they  can  accomplish  over  one 
thousand  AES  block  cipher  evaluations  per  second. 

In  summary  Table  1  summarizes  the  different  performance  figures  and  security  models  for  prior  work 
on  implementing  AES  using  multi-party  computation,  with  also  a  comparison  with  our  own  work.  Like 
all  network  based  protocols  a  significant  time  can  be  spent  waiting  for  data,  thus  authors  have  found  that 
executing  many  calculations  in  parallel  (as  in  for  example  AES-CTR  mode)  can  have  significant  performance 
enhancements.  Thus  for  papers  which  report  such  results  we  give  the  improved  amortized  costs  for  multiple 
executions  (or  just  the  blocks-per-second  count  for  a  single  execution  if  no  improvement  via  amortization 
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occurs).  However,  single  execution  costs  are  still  important  since  this  deals  with  the  case  of  (for  example) 
AES-CBC  mode.  In  our  implementation  we  found  little  gain  in  performing  multiple  AES  evaluations  in 
parallel. 


Paper 

Security 

Total 

Number 

Parties 

Max 

Number 

Adv. 

Time  for 
single 
AES  Block 

(Amortized) 
Blocks 
per  Sec 

Expanded 

Key 

Notes 

[26] 

semi-honest 

2 

1 

7.0s 

0.1 

N 

Yao 

[15] 

semi-honest 

2 

1 

3.3s 

0.3 

N 

Yao 

[16] 

semi-honest 

2 

1 

0.2s 

5.0 

Y 

Yao 

[13] 

semi-honest 

3 

1 

1.2s 

0.9 

N 

Shamir 

[18] 

semi-honest 

3 

1 

N/A 

67 

Y 

Additive 

[19] 

semi-honest 

3 

1 

1.0s 

1893 

Y 

Additive 

[26] 

covert 

2 

1 

95s 

«  0 

N 

Yao 

This  work 

covert 

2 

1 

0.17s 

10.3 

Y 

SPDZ 

This  work 

covert 

3 

2 

0.19s 

9.6 

Y 

SPDZ 

This  work 

covert 

4 

3 

0.18s 

9.2 

Y 

SPDZ 

This  work 

covert 

5 

4 

0.19s 

7.4 

Y 

SPDZ 

This  work 

covert 

10 

9 

0.23s 

5.2 

Y 

SPDZ 

[26] 

active 

2 

1 

19m 

«  0 

N 

Yao 

[25] 

active 

2 

1 

4.0s 

32 

N 

OT 

[17] 

active 

2 

1 

1.0s 

1.0 

Y 

Yao 

[13] 

active 

4 

1 

2.1s 

0.5 

N 

Shamir 

This  work 

active 

2 

1 

0.26s 

5.0 

Y 

SPDZ 

This  work 

active 

3 

2 

0.29s 

4.7 

Y 

SPDZ 

This  work 

active 

4 

3 

0.32s 

4.6 

Y 

SPDZ 

This  work 

active 

5 

4 

0.34s 

4.4 

Y 

SPDZ 

This  work 

active 

10 

9 

0.41s 

3.6 

Y 

SPDZ 

Table  1.  A  comparison  of  different  MFC  implementations  of  AES.  We  only  give  the  online-times  for  those  protocols 
which  have  a  pre-processing  phase.  We  also  note  whether  the  implementation  assnmes  a  pre-expanded  key  or  not. 


In  interpreting  the  table  one  needs  to  note  that  Yao  based  experiments  usually  implement  a  different 
functionality.  Namely,  the  circuit  constructor  is  the  player  holding  the  key.  Whether  the  key  is  expanded  or 
not  refers  to  whether  the  garbled  circuit  has  this  key  hardwired  in  or  not. 


3  The  SPDZ  Protocol 

We  now  give  an  overview  of  the  SPDZ  protocol,  for  more  details  see  [14].  The  reader  should  however  note 
we  make  a  number  of  minor  alterations  to  the  basic  protocol,  all  of  which  are  describe  below.  Some  of  these 
alterations  are  due  to  us  working  in  the  random  oracle  model  (which  enables  us  to  simplify  a  number  of 
sub-protocols),  whilst  some  are  simply  a  functional  change  in  terms  of  how  inputs  to  the  parties  are  created 
and  distributed.  In  addition  we  describe  how  to  simplify  the  SPDZ  protocol  to  the  case  of  covert  adversaries. 

The  SPDZ  protocol,  being  based  on  the  Beaver  circuit  randomization  technique  [3],  comes  in  two  phases. 
In  the  first  phase  a  large  number  of  random  triples  are  produced,  such  that  each  party  holds  a  share  of 
the  triple,  and  such  that  the  underlying  values  in  the  triple  satisfy  a  multiplicative  relation.  This  phase  is 
referred  to  as  the  “Offline  Phase”  since  the  triples  do  not  depend  on  either  the  function  to  be  evaluated 
(bar  their  number  should  exceed  a  constant  multiple  of  the  number  of  multiplication  gates  in  the  evaluated 
function),  and  the  triples  do  not  depend  on  the  inputs  to  the  function  to  be  evaluated.  In  the  second  phase, 
called  the  “Online  Phase”  the  triples  are  used  to  evaluate  the  function  on  the  given  input. 
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The  key  to  understanding  the  SPDZ  protocol  is  to  note  that  all  values  are  shared  with  respect  to  a  non¬ 
standard  secret  sharing  scheme,  which  incorporates  a  MAC  value.  To  describe  this  secret  sharing  scheme  we 
fix  a  finite  field  F^.  The  MAC  keys  are  values  aj  G  for  1  <  j  <  nMAC  such  that  player  i  holds  the  share 
€  Fq  where 

O^j  =  O^j  i  “t“  •  *  *  “t“ 

The  shared  values  are  then  given  by  the  following  sharing  of  a  value  a  G  F^, 

(a)  :=  (S,  (ai,...,a„),  (yyi, . .  •  ,7yn)”=f ), 
where  a  is  the  shared  value,  S  is  public  and  we  have  the  equalities 

a  =  Ui  +  •  •  •  +  (Zn, 

aj  ■  (a  +  S)  =  7j,i  -I - h  7j,„  for  1  <  j  <  riMAC- 

Given  this  data  representing  a  shared  value  a  each  player  Pi  holds  the  data  (<5,  a^,  To  ease 

notation  we  write  ^j,i(a)  to  denote  the  share  of  the  jth  MAC  on  item  a  held  by  party  i.  Arithmetic  in  this 
representation  is  componentwise,  more  precisely  we  have 

(a)  +  {h)  =  {a +  h),  e  •  (a)  =  {e  ■  a)  and  e  -I-  (a)  =  (e  -I-  a), 

where 

e  +  {a)  =  {6-  e,  (m  -k  e,  02, . . . ,  a„),  (7^,1, . .  •  ,7j.n)”=f  )• 

The  simplicity  of  the  above  method  for  adding  a  constant  value  to  (a)  is  the  reason  of  the  public  value  6.  In 
[14]  the  presentation  is  simplified  to  having  only  umac  =  !>  however  the  case  of  more  general  values  of  umac 
is  discussed.  In  our  implementation  having  timac  >  1  will  be  vital  to  ensure  active  security  when  dealing 
with  small  finite  fields,  thus  we  present  the  more  general  case  above. 

The  SPDZ  protocol  can  tolerate  active  adversaries  and  dishonest  majority  (ignoring  the  case  where  one 
of  the  dishonest  players  aborts)  amongst  a  total  of  n  parties.  Thus  we  can  assume  that  n  —  1  of  the  parties  are 
dishonest  and  will  arbitrarily  deviate  from  the  protocol.  The  SPDZ  protocol  guarantees  that  if  the  protocol 
terminates  then  the  honest  parties  know  that  their  resulting  output  is  correct,  except  with  a  negligible 
probability.  For  active  adversaries  we  set  this  probability,  to  mirror  the  choice  in  [14],  to  For  covert 
adversaries  we  adapt  the  protocol  so  that  the  probability  that  a  cheating  adversary  will  be  detected  is  lower 
bounded  by 

min  |l  -  1  -  2  ■  (r!-  1)  }  ’ 

where  umac  and  nsAC  are  parameters  to  be  discussed  later  and  F^  is  the  finite  field  over  which  our  triples 
are  defined. 

3.1  Offline  Phase 

The  Offline  Phase  makes  use  of  a  somewhat  homomorphic  encryption  (SHE)  scheme,  with  a  distributed 
decryption  procedure,  and  zero-knowledge  proofs.  In  our  implementation  we  use  the  optimized  non-interactive 
zero-knowledge  proofs  of  knowledge  (NIZKPoKs)  derived  from  the  Fiat-Shamir  heuristic  which  are  described 
in  [14].  Thus  our  Offline  Phase  is  only  secure  in  the  Random  Oracle  model. 

The  specific  SHE  scheme  used  is  a  variant  of  the  BGV  scheme  [10]  over  the  mth  cyclotomic  field.  We 
thus  have  lattices  of  dimension  (j){m),  over  a  modulus  of  size  Q.  Each  ciphertext  consists  of  two  (or  three) 
polynomials  modulo  Q  of  degree  less  than  The  underlying  plaintext  space  can  hold  an  element  of 

The  Offline  Phase  produces  many  triples  of  such  sharings  (a) ,  (5) ,  (c)  such  that  c  =  a  ■  b,  where  these 
values  are  authenticated  via  a  global  set  of  umac  shared  MAC  keys  as  described  above.  The  NIZKPoKs 
mentioned  above  have  soundness  error  1/2,  and  so  in  [14],  we  “batch”  together  sec  executions  so  as  to  reduce 
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the  soundness  error  to  2“^®"^.  This  batching,  combined  with  the  vectoral  plaintext  space,  means  that  a  single 
execution  of  the  Offline  phase  produces  sec  •  I  triples. 

We  can  trivially  modify  the  Offline  Phase  so  that  it  also  outputs,  for  characteristic  two  fields,  a  set  of 
shared  random  bits  and  their  associated  MAGs.  We  can  produce  one  such  shared  bit  for  roughly  one  third 
of  the  cost  of  one  shared  triple.  As  for  the  shared  triples,  each  invocation  of  the  method  to  produce  shared 
random  bits  will  produce  sec  •  i  bits  in  one  go. 

The  main  cost  of  the  Offline  phase  is  in  the  production  and  verification  of  the  zero-knowledge  proofs. 
For  n  players,  for  each  proof  that  a  player  needs  to  produce  he  will  need  to  verify  n  —  1  proofs  of  the  other 
players.  For  the  case  of  covert  adversaries  we  simplify  the  Offline  Phase  as  follows.  We  do  not  batch  together 
proofs,  i.e.  we  take  sec  =  1,  which  results  in  soundness  error  for  each  proof  of  1/2.  In  addition  each  player 
when  it  receives  n  —  1  proofs  from  all  other  players  only  verifies  a  random  proof.  This  means  that  a  cheating 
player  will  be  detected  with  probability  at  least  1/(2  •  (n  —  1))  in  the  Offline  phase,  as  opposed  to  1  —  2“"*° 
when  we  use  the  standard  actively  secure  Offline  Phase. 

3.2  Online  Phase 

Given  that  our  Offline  Phase  is  given  in  the  Random  Oracle  Model  we  alter  the  Online  Phase  from  [14]  so 
that  it  too  utilizes  Random  Oracles.  This  means  we  can  present  a  more  efficient  Online  Phase  than  that 
used  in  [14].  Our  Online  Phase  makes  use  of  three  hash  functions:  The  first  one  Hi  is  used  to  ensure  that 
broadcast  has  happened,  for  this  hash  function  we  require  it  is  one  which  supports  an  API  of  standard  hash 
functions  consisting  of  Init,  Update  and  Finalise  methods.  The  second  hash  function  Hi  is  used  to  generate 
random  values  for  checking  the  linear  MAG  equations  and  the  triples.  The  third  hash  function  iJa,  which  we 
model  as  a  random  oracle,  is  used  to  define  a  commitment  scheme  as  follows:  To  commit  to  a  value  x,  which 
we  denote  by  Commit(a;),  one  generates  a  random  value  r  €  {0,  for  some  security  parameter  sec,  and 
computes  comm  =  H3{x\\r).  To  open  Open(comm,  x,  r)  one  verifies  that  comm  =  H3{x\\r)  returning  x  if  this 
is  true,  and  T  if  it  is  not. 

The  first  change  we  make  is  in  how  we  guarantee  that  consistent  broadcast  occurs.  For  the  Online  phase 
we  assume  that  the  point-to-point  links  between  the  parties  are  authenticated,  but  we  need  to  guarantee 
that  a  dishonest  party  is  not  allowed  to  send  different  messages  to  different  players  when  he  is  required  to 
broadcast  a  single  value  to  all  players.  This  is  done  by  modifying  the  notion  of  a  “partial  opening”  from  [14] 
and  the  notion  of  “broadcast”.  The  “broadcasts”  are  ensured  to  be  correct  via  the  parties  maintaining  a 
hash  of  all  values  received.  This  is  checked  before  the  output  is  reconstructed;  thus  in  the  final  broadcast  to 
recover  the  output  we  utilize  the  re-transmit  method  from  [14]  to  check  consistency  of  the  final  broadcast. 

In  the  original  protocol  “partial  opening”  just  means  a  broadcast  of  the  share  of  a  value  held  by  a  party, 
but  not  the  broadcast  of  the  share  of  the  MAG  on  that  value.  Thus  only  the  value  is  opened,  not  the  MAG 
on  the  value.  However,  we  each  ensure  player  maintains  the  running  totals  of  the  linear  equations  they  will 
eventually  check.  In  [14]  these  linear  equations  were  of  the  form  'Yhk  some  random  agreed  value  e. 

This  gives  an  error  probability  of  T /q,  where  T  is  the  number  of  partial  openings  in  an  execution  of  the 
Online  Phase.  For  small  values  of  q  this  is  not  effective,  thus  we  replace  the  values  e*  by  the  output  of  hash 
function  Hi.  In  Figure  1  we  describe  our  modified  partial  opening,  and  broadcast  protocol,  which  maintains 
a  hash  value  of  all  values  broadcast;  as  well  as  a  method  for  checking  consistency. 

In  the  Online  Phase  the  key  issue  is  that  the  triples  produced  by  the  Offline  Phase  may  not  satisfy  the 
relation  c  =  a  ■  b,  nor  may  the  MAGs  verify.  This  is  because  we  do  not  ensure  that  the  dishonest  parties 
were  “well  behaved”  in  the  Offline  Phase.  Thus  these  two  properties  must  be  checked.  The  Online  Protocol 
of  [14]  does  this  as  follows:  To  check  that  c  =  a  •  6  for  the  triples,  we  will  use  for  the  MPG  evaluation  we 
“sacrifice”  a  set  of  nsAC  extra  triples  per  evaluated  triple.  For  the  sacrificing  method  in  our  implementation, 
we  adopted  the  naive  method  of  [14].  This  results  in  consuming  more  triples,  but  is  simpler  computationally. 
To  check  the  MAG  values  a  series  of  umac  linear  equations  are  checked  at  the  end  of  the  Online  Phase. 

Each  triple  sacrifice  and  MAG  equation  check  can  be  made  to  hold  by  the  adversary  with  probability  l/g. 
Thus  to  reduce  this  to  something  negligible  we  sacrifice  many  triples,  and  utilize  many  MAG  equations.  But 
in  the  case  of  covert  adversaries  we  select  umac  =  'ns^c  =  and  so  the  probability  of  a  cheating  adversary 
being  detected  is  bounded  from  below  by  1  —  l/g. 
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Init():  We  initialize  the  following  data: 

1.  Party  i  executes  //i.lnit(). 

2.  Party  i  sets  cnti  =  0. 

3.  For  j  =  1, . . .  ,nMAC 

(a)  Party  i  sets  aj^i  =  0  and  =  0. 

4.  Party  i  generates  a  random  value  seedi  £  {0, 1}“^  and  sends  it  to  all  other  players. 

Broadcast(ui):  We  broadcast  Vi  and  receive  the  equivalent  broadcasts  from  other  players: 

1.  Party  i  sends  Vi  to  each  player. 

2.  On  receipt  of  {wi, . . . ,  u„}  \  {«;}  execute  iifi.Update(ui||  . . .  ||wn). 

3.  Return  {ui  +  •  •  •  +  Vn}- 

PartialOpen({a)):  Party  i  obtains  the  partial  opening  of  the  shared  value  and  updates  their  partial  sums: 

1.  Execute  {oi, . . . ,  a„}  —  Broadcast(ai). 

2.  a  =  (2i  +  •  •  •  +  an- 

3.  (eill  . . .  ||e„„Ac)  =  ^i'2(0||seedi||  . . .  ||seed„||cnti)  £  F,. 

4.  cnti  =  cnti  +  1 

5.  For  j  =  1, . . .  ,nMAC 

(a)  ^j,i  —  4”  ^3  '  4”  ^a). 

(b)  =  7t,i  +  Gj  ■  yj,i{a). 

6.  Return  a. 

Verify ():  We  check  all  broadcasts  have  been  consistent: 

1.  Party  i  computes  hi  —  f7i.Finalise()  and  sends  hi  to  each  player. 

2.  On  receipt  of  hj  from  player  j,  if  hi  7^  hj  then  abort. 

J^hg.  1.  Methods  tor  Partial  Opening  and  Broadcast  tor  Party  i 


Both  of  these  checks  require  that  the  parties  agree  on  some  global  random  values  at  different  points  in 
the  protocol.  In  [14]  these  extra  shared  values  are  determined  in  the  Offline  Phase,  via  a  different  form  of 
secret  sharing;  with  the  sharings  being  opened  at  the  critical  point  in  the  Online  protocol.  The  benefit  of  this 
approach  is  that  one  obtains  a  protocol  which  is  UC  secure  without  the  need  for  Random  Oracles;  however 
the  down-side  is  that  the  Offline  Phase  becomes  relatively  complex.  In  our  work  we  take  the  view  that  since 
Random  Oracles  have  been  used  in  the  Offline  Phase  one  might  as  well  exploit  them  in  the  Online  Phase. 
Thus  these  shared  values  are  obtained  via  a  Random  Oracle  based  commitment  scheme  as  we  now  describe. 

The  next  alteration  we  make  to  the  Online  Phase  of  [14]  is  that  we  assume  that  the  players  shares  of 
the  input  values  are  “magically  distributed”  to  them.  This  can  be  justified  in  two  ways.  Firstly  we  are  only 
interested  in  timing  the  main  Offline  and  Online  Protocol  and  the  input  distribution  phase  is  just  an  added 
complication.  Secondly,  a  key  application  scenario  for  MPC  is  when  the  players  are  computing  a  function  on 
behalf  of  some  client.  In  such  a  situation  the  players  do  not  themselves  have  any  input,  it  is  the  client  which 
has  input.  In  such  a  situation  the  players  would  obtain  their  respective  input  shares  directly  from  the  client; 
thus  eliminating  the  need  entirely  for  a  special  protocol  to  deal  with  obtaining  the  input  shares. 

Our  final  alteration  is  that  we  utilize  a  new  online  operation,  in  addition  to  local  addition  and  multi¬ 
plication,  called  BitDecomposition.  We  first  note  that  we  can  given  a  sharing  (a)  of  a  finite  field  element 
a  £  F2fc  =  V2[X]/ F{X),  and  a  set  of  k  randomly  shared  bits  (r^)  for  f  =  0, . . . ,  fc  —  1.  Suppose  we  write  a  as 
53^=0^  Oi  •  X*,  our  goal  is  to  produce  (oi).  Firstly  via  a  local  operation  we  compute  a  sharing  of  r  =  ^  •  X* 

by  computing  (r)  =  ^{ri)  •  X'^ .  Then  we  produce  a  masked  value  of  a,  via  (c)  =  (a)  4-  (r).  The  value  of  (c) 
is  then  opened  to  reveal  c  and  we  compute  the  decomposition  c  =  ^  Ci  •  X* .  Then  we  can  locally  compute 
(tti)  =  Ci  4-  {vi).  Note,  if  a  is  known  to  be  in  a  subfield  of  F2fc,  as  it  will  be  in  one  of  our  implementations 
for  k  =  40,  we  can  utilize  the  embedding  of  the  subfield  into  the  larger  field  to  reduce  the  number  of  shared 
random  bits  needed  for  this  decomposition  down  to  the  degree  of  the  subfield.  We  refer  to  Appendix  A  for 
more  details. 

Given  these  alterations  to  the  Online  Phase  of  [14]  we  present  the  modified  protocol  in  Figure  2  of  the 
Appendix. 
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4  S-Box  Implementation 


We  present  two  distinct  methodologies  to  implement  the  S-Box.  The  first  requires  the  Offline  Phase  to  only 
produce  multiplication  triples,  and  utilizes  the  algebraic  properties  of  the  S-Box.  The  second  requires  the 
Offline  Phase  to  also  produce  sharings  (and  associated  MAGs)  of  random  bits. 


4.1  S-Box  Via  Algebraic  Operations 

A  key  design  criteria  of  any  block  cipher  is  that  it  should  be  highly  non-linear.  In  addition  it  should  be  hard 
to  write  down  a  series  of  simple  algebraic  equations  to  describe  the  cipher.  Since  such  equations  could  give 
rise  to  an  attack  via  algebraic  cryptanalysis.  Indeed  one  reason  for  choosing  AES  as  an  example  benchmark 
for  MPC  protocols,  is  that  being  a  block  cipher  it  should  be  highly  non-linear  and  hence  a  challenge  for  MPC 
protocols.  However,  as  was  soon  realised  after  the  standardization  of  AES  the  S-Box  (the  only  non-linear 
component  in  the  entire  cipher)  can  be  represented  in  a  relatively  clean  algebraic  manner. 

Our  algebraic  method  to  implement  the  S-Box  operation  is  based  on  the  analysis  of  AES  of  Murphy 
and  Robshaw  [23].  In  this  work  the  authors  demonstrate  that  actually  AES  can  be  described  by  (relatively 
simple)  algebraic  formulae  over  F28,  in  other  words  the  transform  between  byte- wise  and  bit-wise  operations 
in  the  standard  representation  of  the  AES  S-Box  is  a  bit  of  a  MacGuffln. 

Recall  the  AES  S-Box  consists  of  an  inversion  in  F28  (which  is  indeed  a  highly  non-linear  function) 
followed  by  a  linear  operation  over  the  bits  of  the  result.  This  is  usually  explained  that  the  mixture  of  the 
two  operations  in  two  distinct  finite  fields  “breaks  any  algebraic  structure”.  This  was  shown  to  be  false  in 
[23] .  Indeed  one  can  express  the  S-Box  calculation  via  the  following  simple  polynomial 

S-Box(z)  =  0x63  -k  0x8F  •  QxBS  •  -k  0x01  •  z®®®  -k  0xF4  •  z®®® 

-k  0x25  •  z®4^  -k  0xF9  •  z®®^  -k  0x09  •  z®®®  -k  0x05  •  z®®^. 


where  (as  is  usual)  operations  are  in  the  finite  field  defined  by  F28  =  F2[x]/(a;®  +  +  x  +  1)  and  the 

notation  0x12  represents  the  element  defined  by  the  polynomial  -k  x.  That  the  operation  can  be  defined 
by  a  polynomial  of  degree  bounded  by  255  is  not  surprising,  since  by  interpolation  any  functions  from  F28  to 
F28  can  be  represented  in  such  a  way.  What  is  surprising  is  that  the  polynomial  is  relatively  sparse,  however 
this  can  be  easily  shown  from  first  principles. 

Lemma  1.  The  AES  S-Box  can  be  represented  by  a  polynomial  which  has  a  non-zero  coefficient  for  the  term 
i  if  and  only  if  i  G  {0, 127, 191, 223, 239,  247,  251, 253,  254}. 


Proof.  Recall  the  AES  S-Box  consists  first  of  inversion  z  ^  z~^  =  y  followed  by  an  F2  linear  operation 
w  =  A  •  -k  b  on  the  bits  of  the  result,  where  y  are  the  bits  in  y.  The  bit  matrix  A  and  the  bit  vector  b 
are  fixed.  The  final  result  is  obtained  by  forming  the  dot-product  of  the  (F2)®  vector  w  with  the  fixed  vector 
X  =  (1,  X,  X®,  X®,  x^,  X®,  X®,  x”^)  G  (F28)®. 

First  note  that  inversion  in  F28  can  be  accomplished  by  computing  z~^  =  z®®^,  since  z®®®  =  1  for  all 
z  yf  0.  The  AES  standard  “defines”  0“^  =  0,  and  so  the  formula  of  z®®^  can  be  applied  even  when  z  =  0  as 
well. 

We  then  note  that  extracting  the  bits  y  =  {yo, . . . ,  1/7)  G  (F2)®  of  an  element  y  =  yo  -\-  yi  ■  x  y^^  ■  x^ 

can  be  obtained  via  a  linear  operation  on  the  action  of  Frobenius  on  y.  This  follows  since  Frobenius  acts 
as  a  linear  map,  and  hence  by  applying  Frobenius  eight  times  we  find  eight  linear  equations  linking  the  set 
{yo, . . . ,  7/7}  with  the  Frobenius  actions  on  y.  This  in  turn  allows  us  to  solve  for  the  bits  y  =  {yo, . . . ,  7/7). 
Thus  there  is  matrix  B  G  (F28)®^®  such  that 


Hence,  the  output  of  the  S-Box  can  be  written  as 


S-Box(z)  =  X  •  (A  •  y  -k  b), 

=  X  .  (A  .  R)  .  (7/,  7/®,  y\  7/®,  7/1®,  7/®®,  y^\  y^^^f  +  x  •  b, 

=  S.(1,7/,7/®,7/1,7/®,7/1®,7/®®,7/®4^7/1®®)T 
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where  s  is  a  fixed  nine  dimensional  vector  over  F28.  On  replacing  y  with  in  the  above  equation,  using 
^255  _  2^  ^  0,  we  obtain  our  result.  With  the  result  also  following  for  z  =  0  by  inspection. 

Finally  to  implement  the  S-Box  we  therefore  need  an  efficient  method  to  obtain  from  an  shared  input 
value  z,  the  shared  values  of  the  elements  z^^^,  z^^®,  z^^^,  z^®^,  z^®^,  z^®^}.  This  is  equivalent  to 

finding  a  short  addition  chain  for  the  set  {127, 191,223,239,247,251,253,254}.  We  found  the  shortest  such 
addition  chain  consists  of  eighteen  additions  and  is  the  chain 

{1, 2, 3, 6, 12, 15, 24, 48, 63, 64, 96, 127, 191, 223, 239, 247, 251, 253, 254}. 

Thus  to  evaluate  a  single  S-Box  requires  eighteen  MFC  multiplication  operations,  as  well  as  some  local 
computation.  Hence,  to  evaluate  the  entire  AES  cipher  we  require  18  •  16  •  10  =  2880  MFC  multiplications. 

Looking  ahead  each  multiplication  operation  will  require  interaction,  and  to  reduce  execution  times  we 
need  to  ensure  that  each  player  is  kept  “busy”,  i.e.  is  not  left  waiting  for  data  to  arrive.  To  do  this  we  will 
interleave  various  different  multiplications  together;  essentially  exploiting  the  instruction  level  parallelism 
(ILF)  within  the  basic  AES  algorithm.  Clearly  one  can  execute  each  of  the  16  S-Box  operations  in  a  single 
round  in  parallel,  thus  obtaining  an  immediate  16-fold  factor  of  ILF.  However,  further  ILF  can  be  exploited 
in  the  addition  chain  above  as  can  be  seen  from  its  graphical  realisation  in  Figure  B.  in  the  Appendix.  We 
see  that  the  addition  chain  can  be  executed  in  twelve  parallel  multiplication  steps;  thus  the  total  number  of 
rounds  of  multiplication  need  for  the  entire  AES  cipher  will  be  12  •  10  =  120. 

4.2  S-Box  Via  BitDecomposition 

As  explained  in  [13]  the  S-Box  can  be  implemented  if  one  has  access  to  shared  random  bits,  via  the  Bit- 
Decomposition  operation.  In  our  second  implementation  choice  we  extend  this  technique,  and  reduce  even 
further  the  amount  of  interaction  needed  to  compute  the  S-Box. 

We  use  this  BitDecomposition  trick  in  two  ways.  The  first  way  is  to  decompose  an  element  in  F2S  into 
it’s  bit  components,  so  as  to  apply  the  linear  map  of  the  S-Box.  This  part  is  exactly  as  described  in  [13]; 
except  when  we  open  the  value  of  (c)  we  perform  a  partial  opening,  leaving  the  checking  of  the  MACs  until 
the  end. 

In  our  second  application  of  BitDecomposition  we  use  BitDecomposition  to  implement  the  operation 
X  — >  This  done  as  follows:  We  decompose  x  into  it’s  constituent  bits.  Then  the  operations  x  — > 

X  — >  are  all  linear  operations,  and  so  can  be  performed  locally.  Finally  the  value  of  =  x~^  is 

computed  via  the  combination 

x254  =  ((a:2  .  cr^)  •  (x»  •  x^^))  .  ((3.32  .  ^64)  .  ^128)  ^ 

which  requires  a  total  of  six  multiplications.  We  could  reduce  this  down  to  four  multiplications  by  applying 
the  Frobenius  map  to  other  elements  [27];  but  this  will  consume  even  more  random  bits  per  S-Box  thus  we 
settled  for  the  above  implementation  which  consumes  16  sharings  of  random  bits  per  S-Box  invocation. 

5  Experimental  Results 

We  implemented  the  SFDZ  protocol  over  finite  fields  of  characteristic  two  and  used  it  to  evaluate  the  AES 
function,  with  the  S-Box  implemented  using  both  the  algebraic  formulation  described  earlier  and  the  variant 
by  BitDecomposition.  As  described  earlier  we  examined  the  case  of  dealing  with  both  covert  adversaries  and 
fully  malicious  (a.k.a.  active)  adversaries  (with  cheating  probability  of  2“^°).  We  note  that  the  probability 
of  could  be  extended  to  smaller  values,  but  we  used  2“^°  so  as  to  be  comparable  with  the  theoretical 
run-time  estimates  given  in  [14].  For  example  to  reduce  the  probability  down  to  2“®°  would  essentially  require 
a  doubling  of  the  cost  of  both  the  Offline  and  Online  stages. 

The  first  decision  one  needs  to  take  is  as  to  what  finite  field  one  should  work  with.  Since  we  are  evaluating 
AES  it  is  natural  to  pick  the  field 

Ks  =  F2[x]/(x®  -I-  x^  -I-  x^  -I-  X  -I-  1) . 
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Another  choice,  particularly  suited  to  our  active  adversary  cheating  probability  of  2  would  be  to  use  the 
field 

Kio  =  F2b]/(2/"°  +  +  y^°  +  1). 

Using  this  finite  field  has  the  advantage  that,  for  active  adversaries,  we  only  need  to  keep  one  MAC  share 
per  data  item,  and  only  one  triple  per  multiplication  needs  to  be  sacrificed.  In  addition  the  field  it's  lies  in 
K4Q  via  the  embedding  x  =  y^  +  1.  We  also  for  means  of  comparison  of  the  Offline  phase  implemented  the 
Offline  protocol  over  a  finite  field  with  q  a  64-bit  prime. 

We  also  experimented  with  various  numbers  of  players,  and  different  values  of  umac  and  nsAC-  As  explained 
in  [14]  all  such  variants  lead  to  different  basic  parameters  {m,Q,£)  of  the  underlying  SHE  scheme. 

We  now  determine  values  of  {m,Q,£)  for  our  SHE  scheme  given  a  specific  finite  field  Fg  (or  in  the  case 
of  q  prime  a  rough  size  for  q),  a  value  for  the  sec  (the  number  of  NIZKPoKs  we  run  in  parallel  in  the 
Offline  stage),  and  the  number  of  players  n.  As  a  “lattice  security  parameter”  we  selected  S  =  1.0052  which 
corresponds  to  roughly  128  bits  of  symmetric  security. 

We  require  finite  fields  F^  of  size  F28  and  F240,  as  well  for  comparison  a  finite  field  where  q  was  a  64-bit 
prime.  We  also  looked  for  parameters  for  n  G  {2,3,4,5,10}  and  sec  G  {1,40}.  As  in  [14]  we  first  search  for 
rough  estimate  of  the  parameters  (m,  Q)  which  fit  these  needs: 


char(Fg) 

n 

sec 

Al 

log2(Q) 

2 

2  <  n  <  10 

40 

12300 

370 

2 

2  <  n  <  5 

1 

8000 

200 

2 

10 

1 

8000 

210 

2  <  n  <  10 

40 

16700 

500 

«  2^^ 

2  <n  <5 

1 

11000 

330 

«  2^^ 

10 

1 

11300 

340 

We  then  selected  values  for  m  as  follows: 

F2S  and  F240,  sec  =  40:  We  select  m  =  17425,  which  gives  us  (/>(m)  =  12800.  The  polynomial  factors 

modulo  two  into  £  =  320  factors  each  of  degree  40.  Thus  these  parameters  can  support  both  our  finite  fields 
F2S  and  F240. 

F2S,  sec  =  1:  We  select  m  =  13107,  which  gives  us  (j){rn)  =  8192.  The  polynomial  <£>rn{X)  factors  modulo 
two  into  £  =  512  factors  each  of  degree  16. 

F240,  sec  =  1:  We  select  m  =  13175,  which  gives  us  (pirn)  =  9600.  The  polynomial  ^m(X)  factors  modulo 
two  into  £  =  240  factors  each  of  degree  40. 

p  «  2®^,  sec  =  40:  We  select,  as  in  [14],  p  =  2®^  -|-  4867  and  m  =  16729  so  that  £  =  (pim)  =  16728. 
p  «  2®^,  sec  =  1:  We  select,  as  in  [14],  p  =  2®^  -|-  8947  and  m  =  11971  so  that  £  =  (pim)  =  11970. 


Recall  that  one  invocation  of  the  Offline  Phase  produces  sec  •  £  triples;  thus  using  the  choices  above  we 
obtain  the  following  summary  table,  where  Trip/^  Bits”  denotes  the  number  of  triples/bits  produced 
per  invocation  of  the  Offline  Phase. 


Field 

Adversary 

Type 

sec 

«MAC 

=  nsAC 

#  Trip/ 
#  Bits 

Ks 

covert 

1 

1 

512 

Ks 

active 

40 

5 

12800 

Kio 

covert 

1 

1 

240 

K^o 

active 

40 

1 

12800 
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We  ran  the  Offline  phase  on  machines  with  Intel  i5  CPU’s  running  at  2.8  GHz.  with  4  GB  of  RAM.  The 
ping  between  machines  over  the  local  area  network  was  approximately  0.3  ms.  We  obtained  the  executions 
time  given  in  Table  2  and  Table  3,  for  the  two  different  finite  field  choices  and  covert /active  security  choices, 
and  various  numbers  of  players.  We  did  not  run  an  example  with  ten  players  and  active  adversaries  since 
this  took  too  long.  We  first  ran  the  Offline  Phase  in  each  example  to  produce  a  minimum  of  5000  triples. 
Clearly  for  some  parameter  sets  a  single  run  produced  much  more  than  5000,  whilst  for  others  we  required 
multiple  runs  so  as  to  reach  5000  triples.  These  results  are  in  Table  2.  These  runs  are  compatible  with  our 
algebraic  S-Box  formulation. 

This  table  also  presents  the  average  time  needed  to  produce  each  triple,  plus  also  the  amortized  time  to 
produce  triples  per  AES  invocation  (in  the  case  where  one  wants  to  evaluate  the  AES  functionality  many 
times).  Recall  to  evaluate  the  AES  functionality  with  our  method  requires  10  •  16  •  18  =  2880  multiplications 
in  total;  thus  the  number  of  triples  needed  is  2880  •  (nsAC  +  1)?  since  each  multiplication  consumes  nsAC  +  1 
triples.  What  is  clear  from  the  table  is  that  if  one  is  wishing  to  obtain  security  against  covert  adversaries 
then  utilizing  the  field  is  preferable.  However,  for  security  against  active  adversaries  the  field  K^q  is  to 
be  preferred. 


Covert  Security  [ 

Active  Security  [ 

Total 

Time  per 

Offline  time 

Total 

Time  per 

Offline  time 

Num. 

Time 

Triple 

per  AES  blk 

Time 

Triple 

per  AES  blk 

Field 

Parties 

(h:m:s) 

(seconds) 

(h:m:s) 

(h:m:s) 

(seconds) 

(h:m:s) 

[  No.  Triples  Produced:  5120  [ 

[  No.  Triples  Produced:  12800  [ 

Ks 

2 

0:01:31 

0.018 

0:01:42 

1:25:57 

0.403 

1:56:02 

Ks 

3 

0:01:32 

0.018 

0:01:43 

1:50:25 

0.518 

2:29:03 

Ks 

4 

0:01:32 

0.018 

0:01:43 

2:14:16 

0.629 

3:01:15 

Ks 

5 

0:01:33 

0.018 

0:01:44 

2:37:30 

0.738 

3:32:37 

Ks 

10 

0:01:48 

0.021 

0:02:01 

4:40:15 

1.314 

6:18:20 

1  No.  Triples  Produced:  5040  | 

1  No.  Triples  Produced:  12800  | 

Kio 

2 

0:05:08 

0.061 

0:05:52 

0:29:34 

0.136 

0:13:18 

Kio 

3 

0:05:13 

0.062 

0:05:57 

0:38:18 

0.180 

0:17:14 

Kio 

4 

0:05:14 

0.062 

0:05:58 

0:46:02 

0.216 

0:20:42 

Kio 

5 

0:05:17 

0.063 

0:06:02 

0:55:51 

0.262 

0:25:07 

Kio 

10 

0:06:02 

0.072 

0:06:53 

1:39:14 

0.465 

0:44:39 

Table  2.  Offline  Run  Time  Examples  For  The  Algebraic  S-Box  Method 


We  then  run  an  Offline  phase  tailored  to  our  BitDecomposition  S-Box  formulation.  Here  we  need  to 
perform  10  •  16  •  6  =  960  multiplications,  and  thus  we  require  960  •  (nsAC  + 1)  triples  to  evaluate  a  single  block. 
But  we  also  require  10  •  16  •  16  =  2560  shared  random  bits  so  as  to  perform  two  eight  bit,  BitDecompositions 
per  S-Box  invocation.  Thus  in  Table  3  we  present  run  times  for  a  second  invocation  of  the  Offline  Phase  in 
which  we  aimed  to  produce  a  minimum  of  5000  triples  and  6600  shared  random  bits  (which  is  the  correct 
ratio  for  covert  security).  Due  to  the  inbalance  between  Triple  and  Bit  production  the  “Offline  Time  per  AES 
Block”  column  needs  to  be  taken  as  rough  estimate.  Again  we  see  that  for  covert  security  Ks  is  preferable, 
and  for  active  security  K^q  is  preferable. 

But,  these  run  times  do  not  seem  comparable  with  the  13ms  per  triple  estimated  by  the  authors  of  [14] 
for  the  Offline  Phase.  However,  this  discrepancy  can  easily  be  explained.  The  run  time  estimates  in  [14]  are 
given  for  arithmetic  circuit  evaluation  over  a  finite  field  of  prime  characteristic  of  64-bits.  With  the  parameter 
choices  in  [14]  this  means  one  can  select  parameters  for  the  SHE  scheme  which  enable  a  16000-fold  SIMD 
parallelism.  For  our  finite  fields  of  degree  two  the  amount  of  SIMD  parallelism  in  the  Offline  Phase  is  much 
lower  than  this.  To  see  the  difference  that  using  large  prime  characteristic  fields  makes  to  the  Offline  Phase 
we  implemented  it,  using  the  parameters  above  to  obtain  the  results  in  Table  4.  As  can  be  seen  from  the 
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[  Covert  Security  [ 

Active  Security  [ 

Total 

Offline  Time 

Total 

Offline  Time 

Number 

Time 

per  AES  Block 

Time 

per  AES  Block 

Field 

Players 

(h:m:s) 

(h:m:s) 

(h:m:s) 

(h:m:s) 

[No.  Triples/Bits:  5120/6556] 

|No.  Triples/Bits:  12800/12800] 

As 

2 

0:02:07 

0:00:47 

1:54:42 

0:51:36 

As 

3 

0:02:10 

0:00:49 

2:26:21 

1:05:51 

Ag 

4 

0:02:13 

0:00:50 

2:56:47 

1:19:33 

Ag 

5 

0:02:36 

0:00:52 

3:29:49 

1:34:25 

Ag 

10 

0:02:33 

0:00:58 

6:06:20 

2:44:51 

No.  Triples/Bits:  5040/6720 

No.  Triples/Bits:  12800/12800 

A40 

2 

0:07:12 

0:02:43 

0:36:14 

0:05:26 

A40 

3 

0:07:12 

0:02:43 

0:47:30 

0:07:07 

A40 

4 

0:07:19 

0:02:47 

0:58:55 

0:08:57 

A40 

5 

0:07:24 

0:02:49 

1:10:33 

0:10:34 

A40 

10 

0:08:32 

0:03:15 

2:10:03 

0:19:32 

Table  3.  Offline  Run  Time  Examples  For  The  S-Box  Via  BitDecomposition 


table  we  produce  triples  for  prime  fields  of  64-bits  in  size  around  twice  as  fast  as  the  estimates  in  [14]  would 
predict. 


Covert  Security  [ 

[  Active  Security  [ 

Total 

Total 

Time  per 

Total 

Total 

Time  per 

Number 

Number 

Time 

Triple 

Number 

Time 

Triple 

Players 

Triples 

(h:m:s) 

(seconds) 

Triples 

(h:m:s) 

(seconds) 

2 

11970 

0:00:27 

0.002 

669120 

1:10:48 

0.006 

3 

11970 

0:00:27 

0.002 

669120 

1:32:13 

0.008 

4 

11970 

0:00:28 

0.002 

669120 

1:55:05 

0.010 

5 

11970 

0:00:29 

0.002 

669120 

2:20:42 

0.013 

10 

11970 

0:00:31 

0.002 

669120 

4:17:10 

0.023 

Table  4.  Offline  Rnn  Time  Examples  For  Fp  With  p  ~  2®"* 


We  now  turn  to  the  Online  Phase;  recall  that  this  itself  comes  in  two  steps  (and  two  variants).  In  the 
first  step  we  evaluate  the  function  itself  (consuming  the  triples  produced  in  the  Offline  Phase),  whereas  in 
the  second  step  we  check  the  MAC  values  and  open  the  final  result.  In  Table  5  we  present  the  run-times  to 
evaluate  the  AES  functionality  for  the  various  parameter  sets  generated  above  using  our  algebraic  formulation 
of  the  S-Box.  These  are  average  run-times  from  all  the  players,  executed  over  20  different  runs.  The  Online 
Phase  was  run  on  the  same  machines  as  in  the  Offline  Phase.  In  Table  6  we  present  the  same  times  using 
the  S-Box  variant  utilizing  the  BitDecomposition  method. 

The  networking  between  players  was  implemented  in  a  point-to-point  fashion  with  each  player  acting  as 
both  a  server  and  a  client.  We  ensured  that  data  was  sent  over  the  sockets  as  soon  as  it  was  ready  by  disabling 
Nagle’s  algorithm  [24].  To  complete  the  function  evaluation  each  player  first  parses  a  program  written  in  a 
specialised  instruction  language.  This  allows  our  implementation  to  take  advantage  of  the  instruction  level 
parallelism  as  described  above  so  as  to  schedule  many  multiplication  operations  to  happen  in  parallel. 

Again  we  see  that  if  security  against  covert  adversaries  is  the  goal  then  using  the  field  ATg  is  to  be  preferred. 
However,  for  security  against  active  adversaries  the  field  performs  better.  We  also  ran  the  Online  Phase 
in  a  run  which  performed  ten  AES  encryptions  in  parallel.  This  resulted  in  only  a  small  improvement  in 
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1  Covert  Security 

1  Active  Security 

Number 

Function 

Checking 

Total 

Function 

Checking 

Total 

Field 

Players 

Evaluation 

Step 

Time 

Evaluation 

Step 

Time 

Ks 

2 

0.284 

0.017 

0.301 

1.319 

0.031 

1.350 

Ks 

3 

0.307 

0.062 

0.369 

1.381 

0.035 

1.416 

Ks 

4 

0.316 

0.027 

0.343 

1.422 

0.028 

1.450 

Ks 

5 

0.344 

0.034 

0.378 

1.461 

0.018 

1.479 

Ks 

10 

0.444 

0.010 

0.454 

1.659 

0.023 

1.682 

Kio 

2 

0.449 

0.012 

0.461 

0.460 

0.021 

0.481 

Kio 

3 

0.486 

0.022 

0.498 

0.475 

0.025 

0.500 

Kio 

4 

0.490 

0.042 

0.532 

0.486 

0.055 

0.541 

Kio 

5 

0.508 

0.037 

0.544 

0.510 

0.026 

0.536 

A40 

10 

0.765 

0.021 

0.786 

0.672 

0.017 

0.689 

Table  5.  Online  Phase  Runtime  Examples  (all  in  seconds)  -  Algebraic  S-Box 


1  Covert  Security 

1  Active  Security 

Number 

Function 

Checking 

Total 

Function 

Checking 

Total 

Field 

Players 

Evaluation 

Step 

Time 

Evaluation 

Step 

Time 

Ks 

2 

0.156 

0.009 

0.165 

0.569 

0.011 

0.580 

Ks 

3 

0.178 

0.008 

0.186 

0.616 

0.019 

0.635 

Ks 

4 

0.169 

0.015 

0.184 

0.620 

0.015 

0.635 

Ks 

5 

0.173 

0.019 

0.192 

0.727 

0.019 

0.746 

Ks 

10 

0.211 

0.015 

0.226 

0.722 

0.044 

0.766 

K40 

2 

0.260 

0.006 

0.266 

0.256 

0.004 

0.260 

Kio 

3 

0.303 

0.009 

0.312 

0.279 

0.011 

0.290 

Kio 

4 

0.303 

0.010 

0.313 

0.287 

0.029 

0.316 

Kio 

5 

0.319 

0.022 

0.341 

0.319 

0.016 

0.335 

Kio 

10 

0.399 

0.016 

0.415 

0.387 

0.027 

0.414 

Table  6.  Online  Phase  Runtime  Examples  (all  in  seconds)  -  S-Box  Via  BitDecomposition 


time  per  AES  block  over  executing  just  one  AES  encryption  at  a  time,  thus  we  do  not  present  these  figures. 
Improving  the  throughput  for  parallel  execution  is  the  subject  of  future  research. 

Overall,  the  two  methods  of  AES  evaluation  are  roughly  comparable.  The  method  via  BitDecomposition 
being  faster,  and  significantly  faster  when  one  also  takes  into  account  the  associated  cost  of  the  Offline  Phase. 
However,  as  remarked  previously  this  method  does  not  result  in  a  generic  Offline  Phase;  since  the  Offline 
Phase  needs  to  “know”  the  expected  ratio  of  Bits  to  Triples  that  it  needs  to  produce  for  the  actual  function 
which  will  be  evaluated  in  the  Online  Phase. 

In  summary  we  have  presented  the  first  experimental  results  for  running  MPC  protocols  with  large 
numbers  of  players  (10  as  opposed  to  the  four  or  less  of  prior  work),  and  for  a  dishonest  majority  of  active 
or  covert  adversaries  (as  opposed  to  threshold  adversaries).  It  is  expected  that  our  reported  execution  times 
will  fall  as  dramatically  as  those  have  done  for  two  party  MPC  protocols  in  the  last  couple  of  years.  Thus 
we  can  expect  actively/covertly  secure  MPC  protocols  for  dishonest  majority  to  be  within  reach  of  some 
practical  applications  within  a  few  years. 
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A  Generalized  BitDecomposition 

In  this  section,  we  describe  a  generalized  variant  of  BitDecomposition,  which  includes  bit-decomposition  in 
ATs  as  a  subfield  of 

Let  f  :  V  ^  W  he  a  linear  map  between  two  vector  spaces  over  F2.  Then,  (r)  and  (/(r))  for  a  random 
element  r  G  V  allows  to  securely  compute  (/(a:))  for  any  (x)  by  computing  and  opening  {x  +  r),  and  then 
computing  (/(x))  =  f{x  -b  r)  -b  (/(r)). 

For  bit-decomposition  in  Ks,  define  /  :  Kg  F®  by 

7 

fi'^ai-X’’)  :=  (ao,...,a7). 
i=0 

This  function  clearly  is  linear  over  F2.  In  the  offline  phase,  it  suffices  to  generate  ((ro, . . . ,  ry))  =  ((rp), . . . ,  (ry)) 
for  random  bits  (ro,  ■ .  ■  ,ry)  because  (r)  =  X]I=o(d)  '  can  be  computed  locally.  Note  that  tq,  . . .  ,ry  are 
understood  as  elements  of  Ks,  like  all  variables  in  the  protocol  over  Ks-  Therefore,  one  has  to  make  sure 
that  they  are  in  fact  0  or  1  and  not  another  element  of  Ks-  This  is  done  by  modifying  the  Offline  Phase;  in 
particular  each  party  encrypts  a  random  bit  and  proves  that  it  is  actually  a  bit.  The  homomorphic  structure 
of  the  NIZKPoKs  makes  this  straight-forward.  As  with  the  triples  components,  the  secret  bit  is  defined  as 
the  sum  of  all  inputs,  and  the  secret  sharing  with  MAC  is  computed  by  multiplication  via  the  homomorphic 
property  of  the  ciphertexts  and  threshold  decryption. 

We  now  move  to  bit-decomposition  for  Ks  embedded  in  K4Q.  Let  i  denote  the  embedding  of  Ks  in 
K40.  This  embedding  is  a  field  homomorphism  and  thus  a  linear  map  between  vector  spaces  over  F2.  The 
bit-decomposition  for  t{Ks)  is  defined  by  /  :  i{Ks)  F|, 

7 

f{i{  a,  ■  X*))  :=  (oo, . . . ,  ay). 
i=0 

Again,  /  is  linear  over  F2,  and  thus,  the  protocol  explained  above  is  applicable.  Similarly  to  the  case  of  Ks, 
it  suffices  to  generate  eight  bits  ((ro), . . . ,  (ry))  in  the  offline  phase.  There  is  one  peculiarity  in  this  case:  We 
defined  /  over  i{Ks)  C  K40,  not  K^q.  That  means,  we  assume  that  the  input  of  /  is  an  element  of  i{Ks), 
not  an  arbitrary  element.  This  is  guaranteed  in  our  application,  but  may  not  be  true  in  general. 

In  general  the  function  /  can  easily  be  extended  to  /'  :  A'40  ^  F|  by  defining  f'{x)  :=  f{Pi{Ks){^)) 
for  Pi(Ks)  denoting  the  natural  projection  to  i{Ks).  However,  masking  an  arbitrary  element  x  G  K40  with 
a  random  element  of  i{Ks)  reveals  x  —  Therefore,  one  has  to  mask  x  additionally  with  a  random 
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r'  S  K4o/i{Ks)  before  opening  it,  i.e.,  compute  and  open  {x+i{Y^1^q  ri-X*)+r').  As  above,  the  homomorphic 
structure  of  the  NIZKPoKs  allow  to  generate  (r')  with  the  same  cost  as  a  random  element. 

The  above  discussion  re  F28  and  F240  can  be  extended  to  an  arbitrary  field  F2"  and  a  subfield  F2™  if 
required. 


B  Figures 


Online  Protocol 

Initialize:  We  assume  i)  the  parties  have  already  invoked  the  Offline  Phase  to  obtain  a  sufflcient  number  of 
multiplication  triples  ((a),  (b),  (c));  ii)  each  party  holds  its  share  of  the  global  MAC  keys  aj,i;  iii)  that  the 
parties  have  obtained  (by  some  means)  the  (•)  sharing  of  the  input  values  to  the  computation. 

1.  The  parties  execute  lnit()  to  initialize  their  local  copy  of  the  hash  function  Hi,  and  the  values  seedi, 
cnti,  CLj^i,  and  7j,i. 

2.  The  parties  generate  global  random  values  tj  £  F,  for  j  =  1, ,  nsAC  by  computing  (ti  ||  . . .  ||tnsAc)  = 
A'2(l||seedi|j  . . .  ||seed„). 

The  following  steps  are  performed  according  to  the  circuit  being  evaluated. 

Add:  To  add  two  representations  (x),  {y},the  parties  locally  compute  (x)  +  (y). 

Multiply:  To  multiply  (x) ,  (y)  the  parties  do  the  following: 

1.  They  take  nsAC  +  1  triples  ((a),  (6),  (c)),  ((/i),  (gi),  from  the  set  of  the  available  ones  (and  update 

this  latter  list  by  deleting  these  triples). 

2.  For  j  —  1, . . . ,  n-sAC  player  Pi  computes 

(a)  pj  =  PartialOpen(tj  •  (a)  -  (fj)). 

(b)  aj  =  PartialOpen((6)  —  (pj)). 

(c)  Tj  =  PartialOpen(t^'  •  (c)  -  {hj}  -  Oj  ■  (fj)  -  pj  ■  (gj)  -  aj  ■  pj). 

(d)  If  Tj  yf  0  then  abort. 

3.  If  no  player  has  aborted  the  triple  ((a),  (&),  (c))  is  accepted,  and  the  parties  execute  e  =  PartialOpen((a:) — 
(a))  and  <5  =  PartialOpen((2/)  —  (6)). 

4.  The  parties  locally  compute  the  answer  {z)  =  (c)  +  e  •  (6)  +  5  •  (a)  +  e  •  5 
BitDecomposition:  This  produces  the  BitDecomposition  of  a  shared  value  (a) .  We  present  a  simplified  protocol 

for  when  q  =  2’‘. 

1.  c  =  PartialOpen  ^(a)  +  Yl’iZoi^i)  '  ^*)- 

2.  Write  c  =  Ci  ■  X\ 

3.  Output  (ai)  =  Ci  +  {ri}. 

Output:  We  enter  this  stage  when  the  players  have  (y)  for  the  output  value  y,  but  this  value  has  not  yet  been 
opened.  This  output  value  is  only  correct  if  players  have  behaved  honestly,  which  we  now  need  to  check.  Let 
ai, . . .  ,aT  be  all  values  publicly  opened  so  far,  where  {uk}  =  {Sk,  (afe.i, . .  •  ,ak,n),  (7j.i(afc),  •  •  • , 7J.n(ttfe))"=^^)• 

1.  Player  Pi  computes  (commi,ri)  =  Commit(yi||(7j^i(?/))y=i'‘)- 

2.  The  players  execute  {commi, . . . ,  comm„}  =  Broadcast(commi). 

3.  For  j  —  1, . . . ,  niviAC  the  players  execute 

(a)  Player  Pi  computes  (comm^^i, ryy)  <—  Commit(7jy). 

(b)  Execute  {comrrij,!, . . .  ,commy,„}  =  Broadcast(commjy). 

(c)  Execute  {aj,i, . . . ,  aj.n}  =  Broadcast(aj7). 

(d)  Player  Pi  computes  Oj  =  ayp  +  •  •  •  +  aj,n- 

(e)  All  players  open  commj^  to  7^7  (via  a  call  to  Broadcast),  the  commitments  are  checked  and  if  Open 
returns  T  for  a  player  then  it  aborts. 

(f)  Each  player  verifies  that  aj  ■  CLjj  =  7j,i  for  his  own  values  of  0^7. 

4.  The  players  execute  Verify()  to  confirm  all  broadcasts  have  been  valid. 

5.  To  obtain  the  output  value  y,  the  commitments  to  yi,^j,i{y)  are  opened  via  each  player  transmitting 
to  their  openings  to  each  player,  and  each  player  transmitting  what  it  receives  to  each  other  to  check 
consistency. 

6.  Now,  y  is  defined  as  y  :=  yi  and  each  player  checks  that  aj  ■  {y  +  5y)  =  lj,i{y),  ^or  j  =  1, . . . ,  ttmac. 

Fig.  2.  The  (slightly)  modified  SPDZ  online  phase. 
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Abstract.  SPDZ  (pronounced  “Speedz”)  is  the  nickname  of  the  MPC  protocol  of  Damgard  et  al.  from  Crypto  2012. 
SPDZ  provided  various  efficiency  innovations  on  both  the  theoretical  and  practical  sides  compared  to  previous  work 
in  the  preprocessing  model.  In  this  paper  we  both  resolve  a  number  of  open  problems  with  SPDZ;  and  present  several 
theoretical  and  practical  improvements  to  the  protocol. 

In  detail,  we  start  by  designing  and  implementing  a  covertly  secure  key  generation  protocol  for  obtaining  a  BGV 
public  key  and  a  shared  associated  secret  key.  In  prior  work  this  was  assumed  to  be  provided  by  a  given  setup 
functionality.  Protocols  for  generating  such  shared  BGV  secret  keys  are  likely  to  be  of  wider  applicability  than  to 
the  SPDZ  protocol  alone. 

We  then  construct  both  a  covertly  and  actively  secure  preprocessing  phase,  both  of  which  compare  favourably  with 
previous  work  in  terms  of  efficiency  and  provable  security. 

We  also  build  a  new  online  phase,  which  solves  a  major  problem  of  the  SPDZ  protocol:  namely  prior  to  this  work 
preprocessed  data  could  be  used  for  only  one  function  evaluation  and  then  had  to  be  recomputed  from  scratch  for 
the  next  evaluation,  while  our  online  phase  can  support  reactive  functionalities.  This  improvement  comes  mainly 
from  the  fact  that  our  construction  does  not  require  players  to  reveal  the  MAC  keys  to  check  correctness  of  MAC’d 
values. 

Since  our  focus  is  also  on  practical  instantiations,  our  implementation  offloads  as  much  computation  as  possible 
into  the  preprocessing  phase,  thus  resulting  in  a  faster  online  phase.  Moreover,  a  better  analysis  of  the  parameters 
of  the  underlying  cryptoscheme  and  a  more  specific  choice  of  the  field  where  computation  is  performed  allow  us 
to  obtain  a  better  optimized  implementation.  Improvements  are  also  due  to  the  fact  that  our  construction  is  in  the 
random  oracle  model,  and  the  practical  implementation  is  multi-threaded. 


This  article  is  based  on  an  earlier  article:  ESORICS  2013,  pp  1-18,  Springer  LNCS  8134,  2013,  http://dx.doi.org/10.1007/978- 
3-642-40203-6_1. 
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1  Introduction 


For  many  decades  multi-party  computation  (MFC)  had  been  a  predominantly  theoretic  endeavour  in  cryptography, 
but  in  recent  years  interest  has  arisen  on  the  practical  side.  This  has  resulted  in  various  implementation  improvements 
and  such  protocols  are  becoming  more  applicable  to  practical  situations.  A  key  part  in  this  transformation  from  theory 
to  practice  is  in  adapting  theoretical  protocols  and  applying  implementation  techniques  so  as  to  significantly  improve 
performance,  whilst  not  sacrificing  the  level  of  security  required  by  real  world  applications.  This  paper  follows  this 
modern,  more  practical,  trend. 

Early  applied  work  on  MFC  focused  on  the  case  of  protocols  secure  against  passive  adversaries,  both  in  the  case  of 
two-party  protocols  based  on  Yao  circuits  [19]  and  that  of  many-party  protocols,  based  on  secret  sharing  techniques  [5, 
10, 25].  Only  in  recent  years  work  has  shifted  to  achieve  active  security  [17, 18, 24],  which  appears  to  come  at  vastly 
increased  cost  when  dealing  with  more  than  two  players.  On  the  other  hand,  in  the  real  applications  active  security 
may  be  more  stringent  than  one  would  actually  require.  In  [2, 3]  Aumann  and  Lindell  introduced  the  notion  of  covert 
security;  in  this  security  model  an  adversary  who  deviates  from  the  protocol  is  detected  with  high  (but  not  necessarily 
overwhelming)  probability,  say  90%,  which  still  translates  into  an  incentive  on  the  adversary  to  behave  in  an  honest 
manner.  In  contrast  active  security  achieves  the  same  effect,  but  the  adversary  can  only  succeed  with  cheating  with 
negligible  probability.  There  is  a  strong  case  to  be  made,  see  [2, 3],  that  covert  security  is  a  “good  enough”  security 
level  for  practical  application;  thus  in  this  work  we  focus  on  covert  security,  but  we  also  provide  solutions  with  active 
security. 

As  our  starting  point  we  take  the  protocol  of  [14]  (dubbed  SFDZ,  and  pronounced  Speedz).  In  [14]  this  protocol 
is  secure  against  active  static  adversaries  in  the  standard  model,  is  actively  secure,  and  tolerates  corruption  of  n  —  1 
of  the  n  parties.  The  SFDZ  protocol  follows  the  preprocessing  model;  in  an  offline  phase  some  shared  randomness  is 
generated,  but  neither  the  function  to  be  computed  nor  the  inputs  need  be  known;  in  an  online  phase  the  actual  secure 
computation  is  performed.  One  of  the  main  advantages  of  the  SFDZ  protocol  is  that  the  performance  of  the  online 
phase  scales  linearly  with  the  number  of  players,  and  the  basic  operations  are  almost  as  cheap  as  those  used  in  the 
passively  secure  protocols  based  on  Shamir  secret  sharing.  Thus,  it  offers  the  possibility  of  being  both  more  flexible 
and  secure  than  Shamir  based  protocols,  while  still  maintaining  low  computational  cost. 

In  [12]  the  authors  present  an  implementation  report  on  an  adaption  of  the  SFDZ  protocol  in  the  random  oracle 
model,  and  show  performance  figures  for  both  the  offline  and  online  phases  for  both  an  actively  secure  variant  and  a 
covertly  secure  variant.  The  implementation  is  over  a  finite  field  of  characteristic  two,  since  the  focus  is  on  providing 
a  benchmark  for  evaluation  of  the  AES  circuit  (a  common  benchmark  application  in  MFC  [24, 1 1]). 

Our  Contributions:  In  this  work  we  present  a  number  of  contributions  which  extend  even  further  the  ability  the  SFDZ 
protocol  to  deal  with  the  type  of  application  one  is  likely  to  see  in  practice.  All  our  theorems  are  proved  in  the 
UC  model,  and  in  most  cases,  the  protocols  make  use  of  some  predefined  ideal  functionalities.  We  give  protocols 
implementing  most  of  these  functionalities,  the  only  exception  being  the  functionality  that  provides  access  to  a  random 
oracle.  This  is  implemented  using  a  hash  functions,  and  so  the  actual  protocol  is  only  secure  in  the  Random  Oracle 
Model.  We  back  up  these  improvements  with  an  implementation  which  we  report  on. 

Our  contributions  come  in  two  flavours.  In  the  first  flavour  we  present  a  number  of  improvements  and  extensions 
to  the  basic  underlying  SFDZ  protocol.  These  protocol  improvements  are  supported  with  associated  security  models 
and  proofs.  Our  second  flavour  of  improvements  are  at  the  implementation  layer,  and  they  bring  in  standard  techniques 
from  applied  cryptography  to  bear  onto  MFC. 

In  more  detail  our  protocol  enhancements,  in  what  are  the  descending  order  of  importance,  are  as  follows: 

1 .  In  the  online  phase  of  the  original  SFDZ  protocol  the  parties  are  required  to  reveal  their  shares  of  a  global  MAC 
key  in  order  to  verify  that  the  computation  has  been  performed  correctly.  This  is  a  major  problem  in  practical 
applications  since  it  means  that  secret-shared  data  we  did  not  reveal  cannot  be  re-used  in  later  applications.  Our 
protocol  adopts  a  method  to  accomplish  the  same  task,  without  needing  to  open  the  underlying  MAC  key.  This 
means  we  can  now  go  on  computing  on  any  secret-shared  data  we  have,  so  we  can  support  general  reactive 
computation  rather  than  just  secure  function  evaluation.  A  further  advantage  of  this  technique  is  that  some  of  the 
verification  we  need  (the  so-called  “sacrificing”  step)  can  be  moved  into  the  offline  phase,  providing  additional 
performance  improvements  in  the  online  phase. 
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2.  In  the  original  SPDZ  protocol  [12, 14]  the  authors  assume  a  “magic”  key  generation  phase  for  the  production 
of  the  distributed  Somewhat  Homomorphic  Encryption  (SHE)  scheme  public/private  keys  required  by  the  offline 
phase.  The  authors  claim  this  can  be  accomplished  using  standard  generic  MPC  techniques,  which  are  of  course 
expensive.  In  this  work  we  present  a  key  generation  protocol  for  the  BGV  [6]  SHE  scheme,  which  is  secure 
against  covert  adversaries.  In  addition  we  generate  a  “full”  BGV  key  which  supports  the  modulus  switching  and 
key  switching  used  in  [16].  This  new  sub-protocol  may  be  of  independent  interest  in  other  applications  which 
require  distributed  decryption  in  an  SHE/EHE  scheme. 

3.  In  [12]  the  modification  to  covert  security  was  essentially  ad-hoc,  and  resulted  in  a  very  weak  form  of  covert 
security.  In  addition  no  security  proofs  or  model  were  given  to  justify  the  claimed  security.  In  this  work  we  present 
a  completely  different  approach  to  achieving  covert  security,  we  provide  an  extensive  security  model  and  provide 
full  proofs  for  the  modified  offline  phase  (and  the  key  generation  protocol  mentioned  above). 

4.  We  introduce  a  new  approach  to  obtain  full  active  security  in  the  offline  phase.  In  [14]  active  security  was  obtained 
via  the  use  of  specially  designed  ZKPoKs.  In  this  work  we  present  a  different  technique,  based  on  a  method  used 
in  [21].  This  method  has  running  time  similar  to  the  ZKPoK  approach  utilized  in  [14],  but  it  allows  us  to  give 
much  stronger  guarantees  on  the  ciphertexts  produced  by  corrupt  players:  the  gap  between  the  size  of  “noise” 
honest  players  put  into  ciphertexts  and  what  we  can  force  corrupt  players  to  use  was  exponential  in  the  security 
parameter  in  [14],  and  is  essentially  linear  in  our  solution.  This  allows  us  to  choose  smaller  parameters  for  the 
underlying  cryptosystem  and  so  makes  other  parts  of  the  protocol  more  efficient. 

It  is  important  to  understand  that  by  combining  these  contributions  in  different  ways,  we  can  obtain  two  different 
general  MPC  protocols:  Eirst,  since  our  new  online  phase  still  has  full  active  security,  it  can  be  combined  with  our 
new  approach  to  active  security  in  the  offline  phase.  This  results  in  a  protocol  that  is  “syntactically  similar”  to  the  one 
from  [14]:  it  has  full  active  security  assuming  access  to  a  functionality  for  key  generation.  However,  it  has  enhanced 
functionality  and  performance,  compared  to  [14],  in  that  it  can  securely  compute  reactive  functionalities.  Second,  we 
can  combine  our  covertly  secure  protocols  for  key  generation  and  the  offline  phase  with  the  online  phase  to  get  a 
protocol  that  has  covert  security  throughout  and  does  not  assume  that  key  generation  is  given  for  free. 

Our  covert  solutions  all  make  use  of  the  same  technique  to  move  from  passive  to  covert  security,  while  avoiding 
the  computational  cost  of  performing  zero-knowledge  proofs.  In  [12]  covert  security  is  obtained  by  only  checking  a 
fraction  of  the  resulting  proofs,  which  results  in  a  weak  notion  of  covert  security  (the  probability  of  a  cheater  being 
detected  cannot  be  made  too  large).  In  this  work  we  adopt  a  different  approach,  akin  to  the  cut-and-choose  paradigm. 
We  require  parties  to  commit  to  random  seeds  for  a  number  of  runs  of  a  given  sub-protocol,  then  all  the  runs  are 
executed  in  parallel,  finally  all  bar  one  of  the  runs  are  “opened”  by  the  players  revealing  their  random  seeds.  If  all 
opened  runs  are  shown  to  have  been  performed  correctly  then  the  players  assume  that  the  single  un-opened  run  is  also 
correctly  executed. 

Note  that  since  these  checks  take  place  in  the  offline  phase  where  the  inputs  are  not  yet  available,  we  obtain  the 
strongest  flavour  of  covert  security  defined  in  [2],  where  the  adversary  learns  nothing  new  if  he  decides  to  try  to  cheat 
and  is  caught. 

A  pleasing  side-effect  of  the  replacement  of  zero-knowledge  proofs  with  our  custom  mechanism  to  obtain  covert 
security  is  that  the  offline  phase  can  be  run  in  much  smaller  “batches”.  In  [12, 14]  the  need  to  amortize  the  cost  of 
the  expensive  zero-knowledge  proofs  meant  that  the  players  on  each  iteration  of  the  offline  protocol  executed  a  large 
computation,  which  produced  a  large  number  of  multiplication  triples  [4]  (in  the  millions).  With  our  new  technique 
we  no  longer  need  to  amortize  executions  as  much,  and  so  short  runs  of  the  offline  phase  can  be  executed  if  so  desired; 
producing  only  a  few  thousand  triples  per  run. 

Our  second  flavour  of  improvements  at  the  implementation  layer  are  more  mundane;  being  mainly  of  an  imple¬ 
mentation  nature. 

1 .  We  focus  on  the  more  practical  application  scenario  of  developing  MPC  where  the  base  arithmetic  domain  is  a 
finite  field  of  characteristic  p  >  2.  The  reader  should  think  p  «  2^^ ,  2®"* ,  2^^®  and  the  type  of  operations  envisaged 
in  [8, 9]  etc.  Eor  such  applications  we  can  offload  a  lot  of  computation  into  the  SPDZ  offline  phase,  and  we  present 
the  necessary  modifications  to  do  so. 

2.  Parameters  for  the  underlying  BGV  scheme  are  chosen  using  the  analysis  used  in  [16]  rather  than  the  approach 
used  in  [14].  In  addition  we  pick  specific  parameters  which  enable  us  to  optimize  for  our  application  to  SPDZ 
with  the  choices  of  p  above. 
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3.  We  assume  the  random  oracle  model  throughout,  this  allows  us  to  simplify  a  number  of  the  sub-procedures  in 
[14];  especially,  related  to  aspects  of  the  protocol  which  require  commitments. 

4.  The  underlying  modular  arithmetic  is  implemented  using  Montgomery  arithmetic  [20],  this  is  contrasted  to  earlier 
work  which  used  standard  libraries,  such  as  NTL,  to  provide  such  operations. 

5.  The  removal  of  the  need  to  use  libraries  such  as  NTL  means  the  entire  protocol  can  be  implemented  in  a  multi¬ 
threaded  manner;  thus  it  can  make  use  of  the  multiple  cores  on  modern  microprocessors. 

This  extended  abstract  presents  the  main  ideas  behind  our  improvements  and  details  of  our  implementation.  For  a 
full  description  including  details  of  the  associated  sub-procedures,  security  models  and  associated  full  security  proofs 
please  see  the  full  version  of  this  paper  at  [13]. 


2  SPDZ  Overview 


We  now  present  the  main  components  of  the  SPDZ  protocol;  in  this  section  unless  otherwise  specified  we  are  simply 
recapping  on  prior  work.  Throughout  the  paper  we  assume  the  computation  to  be  performed  by  n  players  over  a  fixed 
finite  field  Fp  of  characteristic  p.  The  high  level  idea  of  the  online  phase  is  to  compute  a  function  represented  as  a 
circuit,  where  privacy  is  obtained  by  additively  secret  sharing  the  inputs  and  outputs  of  each  gate,  and  correctness  is 
guaranteed  by  adding  additive  secret  sharings  of  MACs  on  the  inputs  and  outputs  of  each  gate.  In  more  detail,  each 
player  Pi  has  a  uniform  share  €  Fp  of  a  secret  value  a  =  ai  +  ■■■  +  thought  of  as  a  fixed  MAC  key.  We  say 
that  a  data  item  a  G  Fp  is  (•)-shared  if  Pi  holds  a  tuple  {ai,^{a)i),  where  is  an  additive  secret  sharing  of  a,  i.e. 
a  =  oi  -I-  •  •  •  -I-  a„,  and  7(0)^  is  an  additive  secret  sharing  of  j(a)  :=  a  ■  a,  i.e.  7(a)  =  7(0)1  +  •  •  •  +  7(a)„. 

For  the  readers  familiar  with  [14],  this  is  a  simpler  MAC  definition.  In  particular  we  have  dropped  5a  from  the 
MAC  definition;  this  value  was  only  used  to  add  or  subtract  public  data  to  or  from  shares.  In  our  case  5a  becomes 
superfluous,  since  there  is  a  straightforward  way  of  computing  a  MAC  of  a  public  value  a  by  defining  7(0)^  ^  a  ■  ai. 

During  the  protocol  various  values  which  are  (-(-shared  are  “partially  opened”,  i.e.  the  associated  values  Oi  are 
revealed,  but  not  the  associated  shares  of  the  MAC.  Note  that  linear  operations  (addition  and  scalar  multiplication)  can 
be  performed  on  the  (-(-sharings  with  no  interaction  required.  Computing  multiplications,  however,  is  not  straightfor¬ 
ward,  as  we  describe  below. 

The  goal  of  the  offline  phase  is  to  produce  a  set  of  “multiplication  triples”,  which  allow  players  to  compute  prod¬ 
ucts.  These  are  a  list  of  sets  of  three  (-(-sharings  {(a( ,  (6(,  (c(}  such  that  c  =  a  -  6.  In  this  paper  we  extend  the  offline 
phase  to  also  produce  “square  pairs”  i.e.  a  list  of  pairs  of  (-(-sharings  {(a( ,  (6(}  such  that  b  =  a^,  and  “shared  bits” 
i.e.  a  list  of  single  shares  (a)  such  that  a  G  {0, 1}. 

In  the  online  phase  these  lists  are  consumed  as  MFC  operations  are  performed.  In  particular  to  multiply  two  (-(- 
sharings  (x)  and  (y)  we  take  a  multiplication  triple  { (a) ,  {b) ,  (c(}  and  partially  open  (x)  —  (a)  to  obtain  e  and  (y)  —  (b) 
to  obtain  5.  The  sharing  of  z  =  x  ■  y  is  computed  from  (z)  ^  (c)  +  e  ■  {b)  +  5  ■  (a)  +  e  ■  5. 

The  reason  for  us  introducing  square  pairs  is  that  squaring  a  value  can  then  be  computed  more  efficiently  as  follows: 
To  square  the  sharing  (x)  we  take  a  square  pair  {{a) ,  (b)}  and  partially  open  (x)  -  (a)  to  obtain  e.  We  then  compute 
the  sharing  of  z  =  x^  from  (z)  ^  {b)  +  2  ■  e  ■  {x)  —  Finally,  the  “shared  bits”  are  useful  in  computing  high  level 
operation  such  as  comparison,  bit-decomposition,  fixed  and  floating  point  operations  as  in  [1, 8, 9]. 

The  offline  phase  produces  the  triples  in  the  following  way.  We  make  use  of  a  Somewhat  Homomorphic  Encryption 
(SHE)  scheme,  which  encrypts  messages  in  Fp,  supports  distributed  decryption,  and  allows  computation  of  circuits  of 
multiplicative  depth  one  on  encrypted  data.  To  generate  a  multiplication  triple  each  player  Pi  generates  encryptions  of 
random  values  Ui  and  bi  (their  shares  of  a  and  b).  Using  the  multiplicative  property  of  the  SHE  scheme  an  encryption 
of  c  =  (ai  -f  -  -  -  -f  a„)  -  (61  -f  -  -  -  -f  &„)  is  produced.  The  players  then  use  the  distributed  decryption  protocol  to 
obtain  sharings  of  c.  The  shares  of  the  MACs  on  a,  b  and  c  needed  to  complete  the  (-(-sharing  are  produced  in  much 
the  same  manner.  Similar  operations  are  performed  to  produce  square  pairs  and  shared  bits.  Clearly  the  above  (vague) 
outline  needs  to  be  fleshed  out  to  ensure  the  required  covert  security  level.  Moreover,  in  practice  we  generate  many 
triples/pairs/shared-bits  at  once  using  the  SIMD  nature  of  the  BGV  SHE  scheme. 
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3  BGV 


We  now  present  an  overview  of  the  BGV  scheme  as  required  by  our  offline  phase.  This  is  only  sketched,  the  reader  is 
referred  to  [6, 15, 16]  for  more  details;  our  goal  is  to  present  enough  detail  to  explain  the  key  generation  protocol  later. 

3.1  Preliminaries 

Underlying  Algebra:  We  fix  the  ring  Rq  =  (Z/gZ)  for  some  cyclotomic  polynomial  <Pm{X),  where 

m  is  an  parameter  to  be  determined  later  (see  Appendix  G).  Note  that  q  may  not  necessarily  be  prime.  Let  R  = 
'L\X]/<Prn{X),  and  (j){m)  denote  the  degree  of  R  over  Z,  i.e.  Euler’s  <j)  function.  The  message  space  of  our  scheme 
will  be  Rp  for  a  prime  p  of  approximately  32,  64  or  128-bits  in  length,  whilst  ciphertexts  will  lie  in  either  Rq^  or 
for  one  of  two  moduli  go  Qi-  We  select  R  =  Z[X]/(X™/^  +  1)  for  m  a  power  of  two,  and  p  =  1  (mod  m).  By 
picking  m  and  p  this  way  we  have  that  the  message  space  Rp  offers  m/2-fold  SIMD  parallelism,  i.e.  Rp  =  In 

addition  this  also  implies  that  the  ring  constant  Cm  from  [14, 16]  is  equal  to  one. 

We  wish  to  generate  a  public  key  for  a  leveled  BGV  scheme  for  which  n  players  each  hold  a  share,  which  is  itself 
a  “standard”  BGV  secret  key.  As  we  are  working  with  circuits  of  multiplicative  depth  at  most  one  we  only  need  two 
levels  in  the  moduli  chain  go  =  po  and  gi  =  Po  ■  Pi-  The  modulus  pi  will  also  play  the  role  of  P  in  [16]  for  the 
SwitchKey  operation.  The  value  pi  must  be  chosen  so  that  pi  =  1  (mod  p),  with  the  value  of  po  set  to  ensure  valid 
distributed  decryption. 

Random  Values:  Each  player  is  assumed  to  have  a  secure  entropy  source.  In  practice  we  take  this  to  be  /  dev/urandom, 
which  is  a  non-blocking  entropy  source  found  on  Unix  like  operating  systems.  This  is  not  a  “true”  entropy  source,  be¬ 
ing  non-blocking,  but  provides  a  practical  balance  between  entropy  production  and  performance  for  our  purposes.  In 
what  follows  we  model  this  source  via  a  procedure  s  ^  Seed(),  which  generates  a  new  seed  from  this  source  of 
entropy.  Calling  this  function  sets  the  players  global  variable  cnt  to  zero.  Then  every  time  a  player  generates  a  new 
random  value  in  a  protocol  this  is  constructed  by  calling  PRFs(cnt),  for  some  pseudo-random  function  PRF,  and  then 
incrementing  cnt.  In  practice  we  use  AES  under  the  key  s  with  message  cnt  to  implement  PRF. 

The  point  of  this  method  for  generating  random  values  is  that  the  said  values  can  then  be  verified  to  have  been 
generated  honestly  by  revealing  s  in  the  future  and  recomputing  all  the  randomness  used  by  a  player,  and  verifying  his 
output  is  consistent  with  this  value  of  s. 

From  the  basic  PRF  we  define  the  following  “induced”  pseudo-random  number  generators,  which  generate  ele¬ 
ments  according  to  the  following  distributions  but  seeded  by  the  seed  s: 

-  TiWT s{h,  n):  This  generates  a  vector  of  length  n  with  elements  chosen  at  random  from  {  —  1, 0, 1}  subject  to  the 
condition  that  the  number  of  non-zero  elements  is  equal  to  h. 

-  2Os(0-5,  n):  This  generates  a  vector  of  length  n  with  elements  chosen  from  {  —  1,0,  1}  such  that  the  probability 
of  coefficient  is  p_i  =  1/4,  po  =  1/2  and  pi  =  1/4. 

-  VQ s{a‘^ ,  n):  This  generates  a  vector  of  length  n  with  elements  chosen  according  to  the  discrete  Gaussian  distri¬ 
bution  with  variance  cr^ . 

-  7^Cs(0.5,  cr^,  n):  This  generates  a  triple  of  elements  {v,  cq,  ei)  where  v  is  sampled  from  ZOs(0.5,  n)  and  eo  and 
Cl  are  sampled  from  T>Qs{a^,  n). 

-  Us{q,  n):  This  generates  a  vector  of  length  n  with  elements  generated  uniformly  modulo  g. 

If  any  random  values  are  used  which  do  not  depend  on  a  seed  then  these  should  be  assumed  to  be  drawn  using  a  secure 
entropy  source  (again  in  practice  assumed  to  be  /dev/urandom).  If  we  pull  from  one  of  the  above  distributions 
where  we  do  not  care  about  the  specific  seed  being  used  then  we  will  drop  the  subscript  s  from  the  notation. 

Broadcast:  When  broadcasting  data  we  assume  two  different  models.  In  the  online  phase  during  partial  opening  we 
utilize  the  method  described  in  [14];  in  that  players  send  their  data  to  a  nominated  player  who  then  broadcasts  the 
reconstructed  value  back  to  the  remaining  players.  For  other  applications  of  broadcast  we  assume  each  party  broadcasts 
their  values  to  all  other  parties  directly.  In  all  instances  players  maintain  a  running  hash  of  all  values  sent  and  received 
in  a  broadcast  (with  a  suitable  modification  for  the  variant  used  for  partial  opening).  At  the  end  of  a  protocol  run  these 
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running  hashes  are  compared  in  a  pair-wise  fashion.  This  final  comparison  ensures  that  in  the  case  of  at  least  two 
honest  parties  the  adversary  must  have  been  consistent  in  what  was  sent  to  the  honest  parties. 

Commitments:  In  Figure  2  we  present  an  ideal  functionality  ^Fcommit  for  commitment  which  will  be  used  in  all  of  our 
protocols.  Our  protocols  will  be  UC  secure,  this  is  possible  despite  the  fact  that  we  allow  dishonest  majority  because 
we  assume  a  random  oracle  is  available;  in  particular  we  model  a  hash  function  Tii  as  a  random  oracle  and  define  a 
commitment  scheme  to  implement  the  functionality  JFcommit  as  follows:  The  commit  function  Commit(m)  generates 
a  random  value  r  and  computes  c  ^  7fi(m|jr).  It  returns  the  pair  (c,  o)  where  o  is  the  opening  information  m||r. 
When  the  commitment  c  is  opened  the  committer  outputs  the  value  o  and  the  receiver  runs  Open(c,  o)  which  checks 
whether  c  =  i  (o)  and  if  the  check  passes  it  returns  m.  See  Appendix  A  for  details. 

3.2  Key  Generation 

The  key  generation  algorithm  generates  a  public/private  key  pair  such  that  the  public  key  is  given  by  pf  =  {cL,b), 
where  a  is  generated  from  U{qi,  (j){m) )  (i.e.  a  is  uniform  in  Rq^ ),  and  b  =  a  ■  5  +  p  ■  e  where  e  is  a  “small”  error  term, 
and  s  is  the  secret  key  such  that  s  =  Si  +  •  •  •  +  s„,  where  player  Pi  holds  the  share  s^.  Recall  since  m  is  a  power  of  2 
we  have  (j>{m)  =  m/2. 

The  public  key  is  also  augmented  to  an  extended  public  key  epf  by  addition  of  a  “quasi-encryption”  of  the  message 
— Pi  •  s^,  i.e.  epf  contains  a  pair  enc  =  (6s,<,2 ,  a^^s2 )  such  that  65^52  =  ■  5  +  p  ■  es^^2  —  pi  •  s^,  where  a^^^2  ^ 

U{qi,  (/)(m))  and  €5  ^2  is  a  “small”  error  term.  The  precise  distributions  of  all  these  values  will  be  determined  when 
we  discuss  the  exact  key  generation  protocol  we  use. 


3.3  Encryption  and  Decryption 


Encp{(m):  To  encrypt  an  element  m  G  Rp,  using  the  modulus  qi,  we  choose  one  “small  polynomial”  (with  0,±1 
coefficients)  and  two  Gaussian  polynomials  (with  variance  cr^),  via  {v,  eg,  ei)  ^  7^Cs(0.5,  (j){m)).  Then  we  set 

Co  =  &  •  w  +  p  •  Co  +  m,  Cx  =  a  ■  V  +  p  ■  ei,  and  set  the  initial  ciphertext  as  c'  =  (cq,  Ci,  1). 

SwitchModulus((co, Cl), f):  The  operation  SwitchModulus(c)  takes  the  ciphertext  c  =  ((co,ci),f)  defined  modulo 
qi  and  produces  a  ciphertext  c'  =  ((cg,  c'i),f  —  1)  defined  modulo  qi-i,  such  that  [cg  —  0  •  ci\q^  =  [cg  —  5  •  Ci]q^_j 
(mod  p).  This  is  done  by  setting  c'  =  Sca\e{ci,q^,  qe-i)  where  Scale  is  the  function  defined  in  [16];  note  we  need  the 
more  complex  function  of  Appendix  E  of  the  full  version  of  [16]  if  working  in  dCRT  representation  as  we  need  to  fix 
the  scaling  modulo  p  as  opposed  to  modulo  two  which  was  done  in  the  main  body  of  [16].  As  we  are  only  working 
with  two  levels  this  function  can  only  be  called  when  f  =  1. 

DeCs  (c) :  Note,  that  this  operation  is  never  actually  performed,  since  no-one  knows  the  shared  secret  key  5,  but  present¬ 
ing  it  will  be  instructive:  Decryption  of  a  ciphertext  (cg,  Ci,£)  at  level  £  is  performed  by  setting  m'  =  [cg  —  s  •  CiJ^^, 
then  converting  m'  to  coefficient  representation  and  outputting  m!  mod  p. 

DistDeCj.  (c):  We  actually  decrypt  using  a  simplification  of  the  distributed  decryption  procedure  described  in  [14], 
since  our  final  ciphertexts  consist  of  only  two  elements  as  opposed  to  three  in  [14].  For  input  ciphertext  (cg,  Cx,£), 
player  Pi  computes  Vi  =  cg  —  0^  •  ci  and  each  other  player  Pi  computes  =  —Si  ■  ci.  Each  party  Pi  then  sets 
tj  =  Vi  -f  p  •  Ti  for  some  random  element  Vi  G  R  with  infinity  norm  bounded  by  2^®'^  ■  B/{n  ■  p),  for  some  statistical 
security  parameter  sec,  and  the  values  are  broadcast;  the  precise  value  B  being  determined  in  Appendix  G.  Then 
the  message  is  recovered  as  ti  -f  •  •  •  -f  t„  (mod  p). 

3.4  Operations  on  Encrypted  Data 

Homomorphic  addition  follows  trivially  from  the  methods  of  [6, 16].  So  the  main  remaining  task  is  to  deal  with 
multiplication.  We  first  define  a  Switch  Key  operation. 
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Switch  Key  (do,  ^1,^2):  This  procedure  takes  as  input  an  extended  ciphertext  c  =  (do,  <^1,  <^2)  defined  modulo  qi,  this 
is  a  ciphertext  which  is  decrypted  via  the  equation 


[do  —  s  •  di  —  •  d2]qi . 

The  Switch  Key  operation  also  takes  the  key-switching  data  enc  =  (65  ,,2 ,  ,,2)  above  and  produces  a  standard  two 

element  ciphertext  which  encrypts  the  same  message  but  modulo  go- 

-  Cq  ^  Pi  •  do  +  6;,, 1,2  •  d2  (mod  gi),  ci  ^  pi  •  di  +  05^52  •  d2  (mod  gi). 

-  Co  ^  Scale(co,gi,go),  c"  ^  Scale(ci,  gi, go). 

-  Output  ((c'o',c'/),0). 

Notice  we  have  the  following  equality  modulo  gi : 

c'o  -  s  •  c'l  =  (pi  •  do)  +  d2  •  65.1,2  -  s  •  ((p  •  di)  -  d2  •  0^.52) 

=  Pi  ■  {do  —  s  ■  di  —  s^d2)  —  p  •  d2  •  £5.52, 

The  requirement  on  pi  =  1  (mod  p)  is  from  the  above  equation  as  we  want  this  to  produce  the  same  value  as 
do  —  s  •  di  —  s^d2  mod  gi  on  reduction  modulo  p. 

Mult(c,  c');  We  only  need  to  execute  multiplication  on  two  ciphertexts  at  level  one,  thus  c  =  ((co,ci),  1)  and  c'  = 
((cq,  c'l),  1).  The  output  will  be  a  ciphertext  c"  at  level  zero,  obtained  via  the  following  steps: 

-  c  ^  SwitchModulus(c),  c'  <—  SwitchModulus(c'). 

-  (do,  di,  d2)  ^  (co  •  c'o.  Cl  •  c'o  +  co  •  ci,  -ci  •  ci). 

-  c"  ^  SwitchKey(do,  di,  d2). 


4  Protocols  Associated  to  the  SHE  Scheme 

In  this  section  we  present  two  sub-protocols  associated  with  the  SHE  scheme;  namely  our  distributed  key  generation 
and  a  protocol  for  proving  that  a  committed  ciphertext  is  well  formed. 


4.1  Distributed  Key  Generation  Protocol  For  BGV 

To  make  the  paper  easier  to  follow  we  present  the  precise  protocols,  ideal  functionalities,  simulators  and  security 
proofs  in  Appendix  B.  Here  we  present  a  high  level  overview. 

As  remarked  in  the  introduction,  the  authors  of  [14]  assumed  a  “magic”  set  up  which  produces  not  only  a  distributed 
sharing  of  the  main  BGV  secret  key,  but  also  a  distributed  sharing  of  the  square  of  the  secret  key.  That  was  assumed  to 
be  done  via  some  other  unspecified  MFC  protocol.  The  effect  of  requiring  a  sharing  of  the  square  of  the  secret  key  was 
that  they  did  not  need  to  perform  KeySwitching,  but  ciphertexts  were  50%  bigger  than  one  would  otherwise  expect. 
Here  we  take  a  very  different  approach:  we  augment  the  public  key  with  the  keyswitching  data  from  [16]  and  provide 
an  explicit  covertly  secure  key  generation  protocol. 

Our  protocol  will  be  covertly  secure  in  the  sense  that  the  probability  that  an  adversary  can  deviate  without  being 
detected  will  be  bounded  by  1  /c,  for  a  positive  integer  c.  Our  basic  idea  behind  achieving  covert  security  is  as  follows: 
Each  player  runs  c  instances  of  the  basic  protocol,  each  with  different  random  seeds,  then  at  the  end  of  the  main 
protocol  all  bar  a  random  one  basic  protocol  runs  are  opened,  along  with  the  respective  random  seeds.  All  parties  then 
check  that  the  opened  runs  were  performed  honestly  and,  if  any  party  finds  an  inconsistency,  the  protocol  aborts.  If 
no  problem  is  detected,  the  parties  assume  that  the  single  unopened  run  is  correct.  Thus  intuitively  the  adversary  can 
cheat  with  probability  at  most  1  /c. 

We  start  by  discussing  the  generation  of  the  main  public  key  in  execution  j  where  j  S  {1, . . . ,  c}.  To  start  with 
the  players  generate  a  uniformly  random  value  aj  G  Rq^ .  They  then  each  execute  the  standard  BGV  key  generation 
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procedure,  except  that  this  is  done  with  respect  to  the  global  element  aj.  Player  i  chooses  a  low-weight  secret  key  and 
then  generates  an  LWE  instance  relative  to  that  secret  key.  Following  [16],  we  choose 

Sij  ^  HyVTs{h,  4>{m))  and  Cij  ^ 

Then  the  player  sets  the  secret  key  as  Sij  and  their  “local”  public  key  as  {aj ,  bi  j )  where  b^  j  =  [aj  •  Sij  +  p  ■  ■ 

Note,  by  a  hybrid  argument,  obtaining  n  ring-LWE  instances  for  n  different  secret  keys  but  the  same  value  of  aj  is 
secure  assuming  obtaining  one  ring-LWE  instance  is  secure.  In  the  LWE  literature  this  is  called  “amortization”.  Also 
note  in  what  follows  that  a  key  modulo  <71  can  be  also  treated  as  a  key  modulo  qq  since  qo  divides  qi  and  Sij  has 
coefficients  in  {  —  1, 0, 1}. 

The  global  public  and  private  key  are  then  set  to  be  =  {aj,bj)  and  Sj  =  Sij  +  ■  ■  ■  +  s„j,  where  bj  = 
[bij  -f  •  •  •  -f  bnj]qi.  This  is  essentially  another  BGV  key  pair,  since  if  we  set  ej  =  eij  -f  •  •  •  -f  then  we  have 

n 

bj  =  ^  (uj  ■  Sij  +  p  ■  eij)  =  Uj  ■  Sj  +  p  ■  Cj, 
i=l 

but  generated  with  different  distributions  for  Sj  and  ej  compared  to  the  individual  key  pairs  above. 

We  next  augment  the  above  basic  key  generation  to  enable  the  construction  of  the  KeySwitching  data.  Given  a 
public  key  ptj  and  a  share  of  the  secret  key  Sij  our  method  for  producing  the  extended  public  key  is  to  produce  in  turn 
(see  Figure  3  for  the  details  on  how  we  create  these  elements  in  our  protocol). 

-  Encpt^(-pi  -s.j) 

encp  j  -I - h  enc'n  j. 

-  Encpt^.  (0) 

-  •  cnc'j)  +  G  Rl^. 

encij  H - h  enc„j  . 

(pfj,encj). 

Note,  that  enc' ^  is  not  a  valid  encryption  of  —pi  ■  Sij,  since  —pi  ■  Sij  does  not  lie  in  the  message  space  of  the 
encryption  scheme.  However,  because  of  the  dependence  on  the  secret  key  shares  here,  we  need  to  assume  a  form  of 
circular  security;  the  precise  assumption  needed  is  stated  in  Appendix  B.  The  encryption  of  zero,  jerOj  j,  is  added  on 
by  each  player  to  re-randomize  the  ciphertext,  preventing  an  adversary  from  recovering  j  from  enci^  /enc' .  We  call 
the  resulting  zptj  the  extended  public  key.  In  [16]  the  keyswitching  data  cncj  is  computed  directly  from  s|;  however, 
we  need  to  use  the  above  round-about  way  since  sj  is  not  available  to  the  parties. 

Finally  we  open  all  bar  one  of  the  c  executions  and  check  they  have  been  executed  correctly.  If  all  checks  pass 
then  the  final  extended  public  key  ep£  is  output  and  the  players  keep  hold  of  their  associated  secret  key  share  See 
Figure  3  for  full  details  of  the  protocol. 

Theorem  1.  In  the  RcoMun-hybrid  model,  the  protocol  flxEYCEN  implements  JiceyGen  with  computational  security 
against  any  static  adversary  corrupting  at  most  n  —  1  parties. 

Recall  that  is  a  standard  functionality  for  commitment.  J^ceyGen  simply  generates  a  key  pair  with  a 

distribution  matching  what  we  sketched  above,  and  then  sends  the  values  0^,5^,  enc',  enc^  for  every  i  to  all  parties  and 
shares  of  the  secret  key  to  the  honest  players.  Like  most  functionalities  in  the  following,  it  allows  the  adversary  to  try 
to  cheat  and  will  allow  this  with  a  certain  probability  1  /c.  This  is  how  we  model  covert  security. 

The  BGV  cryptosystem  resulting  from  J^ceyGen  is  proven  semantically  secure  by  the  following  theorem. 

Theorem  2.  If  the  functionality  .FkeyGen  A  used  to  produce  a  public  key  ept  and  secret  keys  Sifor  i  =  0, ...  ,n  —  1 
then  the  resulting  cryptosystem  is  semantically  secure  based  on  the  hardness  o/RLWEg^  and  the  circular  security 
assumption  in  Appendix  B. 


-  ^ 

-  encjj  4- 

-  ^ 
-  epfi,-  ^ 
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4.2  EncCommit 


We  use  a  sub-protocol  TTencCommit  to  replace  the  -/IzkpoPK  protocol  from  [14].  In  this  section  we  consider  a  covertly 
secure  variant  rather  than  active  security;  this  means  that  players  controlled  by  a  malicious  adversary  succeed  in 
deviating  from  the  protocol  with  a  probability  bounded  by  1  /c.  In  our  experiments  we  pick  c  =  5,  10  and  20.  In 
Appendix  F  we  present  an  actively  secure  variant  of  this  protocol. 

Our  new  sub-protocol  assumes  that  players  have  agreed  on  the  key  material  for  the  encryption  scheme,  i.e. 
TIencCommit  runs  in  the  JFj^EyGEN-hybrid  model.  The  protocol  ensures  that  a  party  outputs  a  validly  created  cipher- 
text  containing  an  encryption  of  some  pseudo-random  message  m,  where  the  message  m  is  drawn  from  a  distribution 
satisfying  condition  cond.  This  is  done  by  committing  to  seeds  and  using  the  cut-and-choose  technique,  similarly  to  the 
key  generation  protocol.  The  condition  cond  in  our  application  could  either  be  uniformly  pseudo-randomly  generated 
from  Rp,  or  uniformly  pseudo-randomly  generated  from  Fp  (i.e.  a  “diagonal”  element  in  the  SIMD  representation). 

The  protocol  HencCommit  and  ideal  functionality  it  implements  are  presented  in  Appendix  C,  along  with  the  proof 
of  the  following  theorem. 

Theorem  3.  In  the  (J^commit,  ^KE'/GEti)-hybrid  model,  the  protocol  HencCommit  implements  JFshe  with  computational 
security  against  any  static  adversary  corrupting  at  most  n  —  1  parties. 

JFshe  offers  the  same  functionality  as  J^keyGen  but  can  in  addition  generate  correctly  formed  ciphertexts  where  the 
plaintext  satisfies  a  condition  cond  as  explained  above,  and  where  the  plaintext  is  known  to  a  particular  player  (even 
if  he  is  corrupt).  Of  course,  if  we  use  the  actively  secure  version  of  HencCommit  from  Appendix  F,  we  would  get  a 
version  of  JFshe  where  the  adversary  is  not  allowed  to  attempt  cheating. 


5  The  Offline  Phase 

The  offline  phase  produces  pre-processed  data  for  the  online  phase  (where  the  secure  computation  is  performed).  To 
ensure  security  against  active  adversaries  the  MAC  values  of  any  partially  opened  value  need  to  be  verified.  We  suggest 
a  new  method  for  this  that  overcomes  some  limitations  of  the  corresponding  method  from  [14].  Since  it  will  be  used 
both  in  the  offline  and  the  online  phase,  we  explain  it  here,  before  discussing  the  offline  phase. 


5.1  MAC  Checking 

We  assume  some  value  a  has  been  (•)-shared  and  partially  opened,  which  means  that  players  have  revealed  shares  of 
the  a  but  not  of  the  associated  MAC  value  7,  this  is  still  additively  shared.  Since  there  is  no  guarantee  that  the  a  are 
correct,  we  need  to  check  it  holds  that  7  =  ao  where  a  is  the  global  MAC  key  that  is  also  additively  shared.  In  [14], 
this  was  done  by  having  players  commit  to  the  shares  of  the  MAC.  then  open  a  and  check  everything  in  the  clear. 
But  this  means  that  other  shared  values  become  useless  because  the  MAC  key  is  now  public,  and  the  adversary  could 
manipulate  them  as  he  desires. 

So  we  want  to  avoid  opening  a,  and  observe  that  since  a  is  public,  the  value  7  —  aa  is  a  linear  function  of  shared 
values  7,  a,  so  players  can  compute  shares  in  this  value  locally  and  we  can  then  check  if  it  is  0  without  revealing 
information  on  a.  As  in  [14],  we  can  optimize  the  cost  of  this  by  checking  many  MACs  in  one  go:  we  take  a  random 
linear  combination  of  a  and  7-values  and  check  only  the  results  of  this.  The  full  protocol  is  given  in  Figure  10;  it  is  not 
intended  to  implement  any  functionality  -  it  is  just  a  procedure  that  can  be  called  in  both  the  offline  and  online  phases. 
M  ACCheck  has  the  following  important  properties. 

Lemma  1.  The  protocol  M  ACCheck  is  correct,  i.e.  it  accepts  if  all  the  values  aj  and  the  corresponding  MACs  are 
correctly  computed.  Moreover,  it  is  sound,  i.e.  it  rejects  except  with  probability  2/p  in  case  at  least  one  value  or  MAC 
is  not  correctly  computed. 

The  proof  of  Lemma  1  is  given  in  Appendix  D.3. 
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5.2  Offline  Protocol 


The  offline  phase  itself  runs  two  distinct  sub-phases,  each  of  which  we  now  describe.  To  start  with  we  assume  a  BGV 
key  has  been  distributed  according  to  the  key  generation  procedure  described  earlier,  as  well  as  the  shares  of  a  secret 
MAC  key  and  an  encryption  Ca  of  the  MAC  key  as  above.  We  assume  that  the  output  of  the  offline  phase  will  be  a 
total  of  at  least  nj  input  tuples,  Um  multiplication  triples,  Us  squaring  tuples  and  rib  shared  bits. 

In  the  first  sub-phase,  which  we  call  the  tuple-production  sub-phase,  we  over-produce  the  various  multiplication 
and  squaring  tuples,  plus  the  shared  bits.  These  are  then  “sacrificed”  in  the  tuple-checking  phase  so  as  to  create  at  least 
rim  multiplication  triples,  ris  squaring  tuples  and  rib  shared  bits.  In  particular  in  the  tuple-production  phase  we  produce 
(at  least)  2  •  rim  multiplication  tuples,  2  ■  rig  +  rib  squaring  tuples,  and  rib  shared  bits.  Tuple-production  is  performed 
by  following  the  protocol  in  Figure  13  and  Figure  14.  The  tuple  production  protocol  can  be  run  repeatedly,  alongside 
the  tuple-checking  sub-phase  and  the  online  phase. 

The  second  sub-phase  of  the  offline  phase  is  to  check  whether  the  resulting  material  from  the  prior  phase  has  been 
produced  correctly.  This  check  is  needed,  because  the  distributed  decryption  procedure  needed  to  produce  the  tuples 
and  the  MACs  could  allow  the  adversary  to  induce  errors.  We  solve  this  problem  via  a  sacrificing  technique,  as  in  [14], 
however,  we  also  need  to  adapt  it  to  the  case  of  squaring  tuples  and  bit-sharings.  Moreover,  this  sacrificing  is  performed 
in  the  offline  phase  as  opposed  to  the  online  phase  (as  in  [14]);  and  the  resulting  partially  opened  values  are  checked 
in  the  offline  phase  (again  as  opposed  to  the  online  phase).  This  is  made  possible  by  our  protocol  MACCheck  which 
allows  to  verify  the  MACs  are  correct  without  revealing  the  MAC  key  a.  The  tuple-checking  protocol  is  presented  in 
Figure  15. 

We  show  that  the  resulting  protocol  ilpREp,  given  in  Figure  12,  securely  implements  the  functionality  iFpREp,  which 
models  the  offline  phase.  The  functionality  JFprep  outputs  some  desired  number  of  multiplication  triples,  squaring 
tuples  and  shared  bits.  In  Appendix  D  we  present  a  proof  of  the  following  theorem. 

Theorem  4.  In  the  (^Fshe,  ^coMMn)-hybrid  model,  the  protocol  ilpREP  implements  with  computational  security 
against  any  static  adversary  corrupting  at  most  n  —  1  parties  if  p  is  exponential  in  the  security  parameter. 

The  security  flavour  of  TTprep  follows  the  security  of  EncCommit,  i.e.  if  one  uses  the  covert  (resp.  active)  version  of 
EncCommit,  one  gets  covert  (resp.  active)  security  for  TTprep. 


6  Online  Phase 


We  design  a  protocol  iToNLiNE  which  performs  the  secure  computation  of  the  desired  function,  decomposed  as  a  circuit 
over  Fp.  Our  online  protocol  makes  use  of  the  preprocessed  data  coming  from  JFprep  in  order  to  input,  add,  multiply 
or  square  values.  Our  protocol  is  similar  to  the  one  described  in  [14];  however,  it  brings  a  series  of  improvements,  in 
the  sense  that  we  could  push  the  “sacrificing”  to  the  preprocessing  phase,  we  have  specialised  procedure  for  squaring 
etc,  and  we  make  use  of  a  different  MAC-checking  method  in  the  output  phase.  Our  method  for  checking  the  MACs 
is  simply  the  MACCheck  protocol  on  all  partially  opened  values;  note  that  such  a  method  has  a  lower  soundness  error 
than  the  method  proposed  in  [14],  since  the  linear  combination  of  partially  opened  values  is  truly  random  in  our  case, 
while  it  has  lower  entropy  in  [14]. 

The  following  theorem,  whose  proof  is  given  in  Appendix  E,  shows  that  the  protocol  iToNUNE^  given  in  Figure  20, 
securely  implements  the  functionality  ^Fonline,  which  models  the  online  phase. 

Theorem  5.  In  the  T^ppp-hybrid  model,  the  protocol  iToNLiNE  implements  .Fqnline  with  computational  security  against 
any  static  adversary  corrupting  at  most  n  —  I  parties  if  p  is  exponential  in  the  security  parameter. 

The  astute  reader  will  be  wondering  where  our  shared  bits  produced  in  the  offline  phase  are  used.  These  will  be 
used  in  “higher  level”  versions  of  the  online  phase  (i.e.  versions  which  do  not  just  evaluate  an  arithmetic  circuit)  which 
implement  the  types  of  operations  presented  in  [8, 9]. 


295 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


7  Experimental  Results 
7.1  KeyGen  and  Offline  Protocols 

To  present  performance  numbers  for  our  key  generation  and  new  variant  of  the  offline  phase  for  SPDZ  we  first  need  to 
define  secure  parameter  sizes  for  the  underlying  BGV  scheme  (and  in  particular  how  it  is  used  in  our  protocols).  This 
is  done  in  Appendix  G,  by  utilizing  the  method  of  Appendices  A.4,  A. 5  and  B  of  [16],  for  various  choices  of  n  (the 
number  of  players)  and  p  (the  field  size). 

We  then  implemented  the  preceding  protocols  in  C++  on  top  of  the  MPIR  library  for  multi-precision  arithmetic. 
Modular  arithmetic  was  implemented  with  bespoke  code  using  Montgomery  arithmetic  [20]  and  calls  to  the  underlying 
mpn_  functions  in  MPIR.  The  offline  phase  was  implemented  in  a  multi-threaded  manner,  with  four  cores  producing 
initial  multiplication  triples,  square  pairs,  shared  bits  and  input  preparation  mask  values.  Then  two  cores  performed 
the  sacrificing  for  the  multiplication  triples,  square  pairs  and  shared  bits. 

In  Table  1  we  present  execution  times  (in  wall  time  measured  in  seconds)  for  key  generation  and  for  an  offline 
phase  which  produces  100000  each  of  the  multiplication  tuples,  square  pairs,  shared  bits  and  1000  input  sharings.  We 
also  present  the  average  time  to  produce  a  multiplication  triple  for  an  offline  phase  running  on  one  core  and  producing 
100000  multiplication  triples  only.  The  run-times  are  given  for  various  values  of  n,p  and  c,  and  all  timings  were 
obtained  on  2.80  GHz  Intel  Core  i7  machines  with  4  GB  RAM,  with  machines  running  on  a  local  network. 


Run  Times 

Time  per 

n 

p  ^ 

c 

KeyGen 

Offline 

Triple  (sec) 

T 

-^32— 

5 

2.4 

156 

0.00140 

2 

232 

10 

5.1 

277 

0.00256 

2 

232 

20 

10.4 

512 

0.00483 

T 

5 

5.9 

202 

0.00194 

2 

264 

10 

12.5 

377 

0.00333 

2 

264 

20 

25.6 

682 

0.00634 

T 

5 

16.2 

307 

0.00271 

2 

2128 

10 

33.6 

561 

0.00489 

2 

2128 

20 

74.5 

1114 

0.00937 

Run  Times 

Time  per 

n 

P  ^ 

c 

KeyGen 

Offline 

Triple(sec) 

T 

-^32— 

5 

3.0 

292 

0.00204 

3 

232 

10 

6.4 

413 

0.00380 

3 

232 

20 

13.3 

790 

0.00731 

3 

^64- 

5 

7.7 

292 

0.00267 

3 

264 

10 

16.3 

568 

0.00497 

3 

264 

20 

33.7 

1108 

0.01004 

3 

5 

21.0 

462 

0.00402 

3 

2128 

10 

44.4 

889 

0.00759 

3 

2128 

20 

99.4 

2030 

0.01487 

Table  1.  Execution  Times  For  Key  Gen  and  Offline  Phase  (Covert  Security) 


We  compare  the  results  to  that  obtained  in  [12],  since  no  other  protocol  can  provide  malicious/covert  security 
for  f  <  n  corrupted  parties.  In  the  case  of  covert  security  the  authors  of  [12]  report  figures  of  0.002  seconds  per  (un¬ 
checked)  64-bit  multiplication  triple  for  both  two  and  three  players;  however  the  probability  of  cheating  being  detected 
was  lower  bounded  by  1/2  for  two  players,  and  1/4  for  three  players;  as  opposed  to  our  probabilities  of  4/5,  9/10  and 
19/20.  Since  the  triples  in  [12]  were  unchecked  we  need  to  scale  their  run-times  by  a  factor  of  two;  to  obtain  0.004 
seconds  per  multiplication  triple.  Thus  for  covert  security  we  see  that  our  protocol  for  checked  tuples  are  superior  both 
in  terms  error  probabilities,  for  a  comparable  run-time. 

When  using  our  active  security  variant  we  aimed  for  a  cheating  probability  of  2“^°;  so  as  to  be  able  to  compare 
with  prior  run  times  obtained  in  [12],  which  used  the  method  from  [14].  Again  we  performed  two  experiments  one 
where  four  cores  produced  100000  multiplication  triples,  squaring  pairs  and  shared  bits,  plus  1000  input  sharings;  and 
one  experiment  where  one  core  produced  just  100000  multiplication  triples  (so  as  to  produce  the  average  cost  for  a 
triple).  The  results  are  in  Table  2. 

By  way  of  comparison  for  a  prime  of  64  bits  the  authors  of  [12]  report  on  an  implementation  which  takes  0.006 
seconds  to  produce  an  (un-checked)  multiplication  triple  for  the  case  of  two  parties  and  equivalent  active  security;  and 
0.008  per  second  for  the  case  of  three  parties  and  active  security.  As  we  produce  checked  triples,  the  cost  per  triple  for 
the  results  in  [12]  need  to  be  (at  least)  doubled;  to  produce  a  total  of  0.012  and  0.016  seconds  respectively. 

Thus,  in  this  test,  our  new  active  protocol  has  running  time  about  twice  that  of  the  previous  active  protocol  from 
[14]  based  on  ZKPoKs.  From  the  analysis  of  the  protocols,  we  do  expect  that  the  new  method  will  be  faster,  but  only 
if  we  produce  the  output  in  large  enough  batches.  Due  to  memory  constraints  we  were  so  far  unable  to  do  this,  but  we 
can  extrapolate  from  these  results:  In  the  test  we  generated  12  ciphertexts  in  one  go,  and  if  we  were  able  to  increase 
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p  ^ 

Offline 

n  —  2 

Time  per  Triple 

Offline 

n  =  3 

Time  per  Triple 

~^TT- 

2366 

0.01955 

3668 

0.02868 

264 

3751 

0.02749 

5495 

0.04107 

2128 

6302 

0.04252 

10063 

0.06317 

Table  2.  Execution  Times  For  Offline  Phase  (Active  Security) 


this  by  a  factor  of  about  10,  then  we  would  get  results  better  than  those  of  [14, 12],  all  other  things  being  equal.  More 
information  can  be  found  in  Appendix  F. 

7.2  Online 

For  the  new  online  phase  we  have  developed  a  purpose-built  bytecode  interpreter,  which  reads  and  executes  pre¬ 
generated  sequences  of  instructions  in  a  multi-threaded  manner.  Our  runtime  supports  parallelism  on  two  different 
levels:  independent  rounds  of  communication  can  be  merged  together  to  reduce  network  overhead,  and  multiple  threads 
can  be  executed  at  once  to  allow  for  optimal  usage  of  modern  multi-core  processors. 

Each  bytecode  instruction  is  either  some  local  computation  (e.g.  addition  of  secret  shared  values)  or  an  ‘open’ 
instruction,  which  initiates  the  protocol  to  reveal  a  shared  value.  The  data  from  independent  open  instructions  can  be 
merged  together  to  save  on  communication  costs.  Each  player  may  run  up  to  four  different  bytecode  files  in  parallel  in 
distinct  threads,  with  each  such  thread  having  access  to  some  shared  memory  resource.  The  advantage  of  this  approach 
is  that  bytecode  files  can  be  pre-compiled  and  optimized,  and  then  quickly  loaded  at  runtime  -  the  online  phase  runtime 
is  itself  oblivious  to  the  nature  of  the  programs  being  run. 

In  Table  3  we  present  timings  (again  in  elapsed  wall  time  for  a  player)  for  multiplying  two  secret  shared  values. 
Results  are  given  for  three  different  varieties  of  multiplication,  reflecting  the  possibilities  available:  purely  sequen¬ 
tial  multiplications;  parallel  multiplications  with  communication  merged  into  one  round  (50  per  round);  and  parallel 
multiplications  running  in  4  independent  threads  (50  per  round,  per  thread).  The  experiments  were  carried  out  on  the 
same  machines  as  the  offline  phase,  running  over  a  local  network  with  a  ping  of  around  0.27ms.  Eor  comparison,  the 
original  implementation  of  the  online  phase  in  [14]  gave  an  amortized  time  of  20000  multiplications  per  second  over 
a  64-bit  prime  field,  with  three  players. 


n 

p  ^ 

Sequential 
Single  Thread 

lultiplications/se 
50  inP 
Single  Thread 

c 

arallel 

Four  Threads 

T 

^32- 

7500 

134000 

398000 

2 

264 

7500 

130000 

395000 

2 

2128 

7500 

120000 

358000 

T 

^32— 

4700 
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292000 

3 

264 

4700 

98000 

287000 

3 

2128 

4600 

90000 

260000 

Table  3.  Online  Times 
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A  Commitments  in  the  Random  Oracle  Model 

A.l  Protocol  and  Functionality 


The  Protocol  TTcommit 

Commit: 

1.  In  order  to  commit  to  v.  Pi  sets  o  ^  t;||r,  where  r  is  chosen  uniformly  in  a  determined  domain,  and  queries  the 
Random  Oracle  Ti.  to  get  c  ^  Ti.{o). 

2.  Pi  then  broadcasts  (c,  i,  r„),  where  Tv  represents  a  handle  for  the  commitment. 

Open: 

1.  In  order  to  open  a  commitment  (c,  i,  r„),  where  c  =  7i(u|  |r),  player  Pi  broadcasts  (o  =  u|  |r,  i,  t„). 

2.  All  players  call  Ti  on  o  and  check  whether  'H{o)  =  c.  Players  accept  if  and  only  if  this  check  passes. 

Fig.  1.  The  Protocol  for  Commitments. 


The  Ideal  Functionality  JFcommit 

Commit:  On  input  (Commit,  v,  i,  Tv)  by  Pi  or  the  adversary  on  his  behalf  (if  Pi  is  corrupt),  where  v  is  either  in  a  specific 
domain  or  _L,  it  stores  (u,  i,  r„)  on  a  list  and  outputs  {i,  r„)  to  all  players  and  adversary. 

Open:  On  input  (Open,  i,  r„)  by  Pi  or  the  adversary  on  his  behalf  (if  Pi  is  corrupt),  the  ideal  functionality  outputs  (v,  i,  Tv) 
to  all  players  and  adversary.  If  (NoOpen,  i,  r„)  is  given  by  the  adversary,  and  Pi  is  corrupt,  the  functionality  outputs 
(_L,  i,  r„)  to  all  players. 

Fig.  2.  The  Ideal  Functionality  for  Commitments 


A.2  UC  Security 

Lemma  2.  In  the  random  oracle  model,  the  protocol  ^^commit  implements  (^commit  with  computational  security 
against  any  static,  active  adversary  corrupting  at  most  n  —  1  parties. 

Proof.  We  here  sketch  a  simulator  such  that  the  environment  cannot  distinguish  if  it  is  playing  with  the  real  protocol 
or  the  functionality  composed  with  the  simulator.  Note  that  the  simulator  replies  to  queries  to  the  random  oracle  Ti 
made  by  the  adversary. 

To  simulate  a  Commit  call,  if  the  committer  Pi  is  honest,  the  simulator  selects  a  random  value  c  and  gives  (c,  i,  Ty) 
to  the  adversary.  Note  that  {i,  r„)  is  given  to  the  simulator  by  (Fcommit  hereafter  receiving  (Commit,  v,  i,  t„)  from  Pi. 
Whereas  if  the  committer  is  corrupt,  then  it  either  queries  Ti  to  get  c,  or  it  does  not  query  it.  Therefore,  on  receiving 
(c*,i,Ty)  from  the  adversary,  the  simulator  has  v  (if  Ti  was  queried)  so  it  sets  v*  ^  v.  If  Ti  was  not  queried,  the 
simulator  sets  dummy  input  v*  and  the  internal  flag  Aborts  to  true.  It  then  sends  (Commit,  v*,i,  Ty)  to  (Fcommit- 
An  Open  call  is  simulated  as  follows.  If  the  committer  is  honest,  the  simulator  gets  {v,i,Ty)  when  Pi  inputs 
(Open,  i,  Ty)  to  (Fcommit-  The  simulator  selects  random  r  and  sets  o  ^  u|  |r.  It  can  now  hand  (o,  i,  t„)  to  the  adversary. 
If  the  random  oracle  is  queried  on  o,  the  simulator  sends  c  as  response.  If  the  committer  is  corrupt,  the  simulator  gets 
{i,Ty)  from  the  adversary,  it  checks  whether  Aborts  is  true,  if  so  it  sends  (NoOpen,  i,  r^)  to  (Fcommit-  Otherwise, 
the  simulator  sends  (Open,  i,  Ti)  to  (Fcommit- 

The  adversary  will  notice  that  queries  to  Ti  are  simulated  only  if  o  has  been  queried  before  resulting  in  different  c, 
but  as  r  is  random  this  happens  only  with  negligible  probability  (assuming  that  the  size  of  the  output  domain  of  Ti  is 
large  enough).  Also,  in  a  simulated  run,  if  the  adversary  does  not  query  Ti  when  committing  will  result  in  abort.  The 
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probability  that  in  a  real  run  players  do  not  abort,  is  equivalent  to  the  the  probability  that  adversary  correctly  guesses 
the  output  of  Ti,  which  happens  with  negligible  probability. 

□ 


B  Key  Generation  :  Protocol,  Functionalities  and  Security  Proof 

B.l  Protocol 


The  protocol  TTkeyGen 

Initialize: 

1.  Every  player  Pi  samples  a  uniform  a  ^  {1,  •  •  • ,  c}  and  asks  JFcommit  to  broadcast  the  handle  rf  ^  Commit(ei) 
for  a  commitment  to  d. 

2.  Every  player  Pi  samples  a  seed  Sij  and  asks  JFcommit  to  broadcast  rf j  Commit(si,j). 

3.  Every  player  Pi  computes  and  broadcasts  aij  ^  Usi  j  {qi,  4>{rn)). 

Stage  1: 

4.  All  the  players  compute  aj  ^  aij  +  ■  ■  ■  +  a-nj- 

5.  Every  player  Pi  computes  Sij  ^  HWTsi  j  {h,  and  eij  ^  'DQg.  .  (cr^,  cj>{m)), 

and  broadcasts  bij  ^  [aj  ■  Sij  +  p  ■  • 

Stage  2: 

6.  All  the  players  compute  bj  ^  bij  +  •  •  •  +  bnj  and  set  ^  (oj,  bj).. 

7.  Every  player  Pi  computes  and  broadcasts  enc'i  j  ^  Encpt .  {—pi  ■  Sij,TZCsi  ^  (0.5,  (^{m))). 

Stage  3: 

8.  All  the  players  compute  cnc(  ^  +  ■  ■  ■  +  mCnj- 

9.  Every  player  Pi  computes  jetO;  ^  ^  Encpe^-  (0,  TZCsi  j  (0.5,  a^,  <f>{m))). 

10.  Every  player  Pi  computes  and  broadcasts  encij  ^  {Sij  ■  enCj)  +  JcrO; 

Output: 

11.  All  the  players  compute  cncj  ^  +  •  •  •  +  enc„j  and  set  eptj  ^  (pf^,  encj). 

12.  Every  player  Pi  calls  J^commit  with  Open(rf ).  If  any  opening  failed,  the  players  output  the  numbers  of  the  respective 
players,  and  the  protocol  aborts. 

13.  All  players  compute  the  challenge  chall  ^  1+  ((ELiCi)  mod  c). 

14.  Every  player  Pi  calls  JF^ommit  with  Open(r/j)  for  j  7^  chall.  If  any  opening  failed,  the  players  output  the  numbers 
of  the  respective  players,  and  the  protocol  aborts. 

15.  All  players  obtain  the  values  committed,  compute  all  the  derived  values  and  check  that  they  are  correct. 

16.  If  any  of  the  checks  fail,  the  players  output  the  numbers  of  the  respective  players,  and  the  protocol  aborts.  Otherwise, 
every  player  Pi  sets 

“  ^  Si, chall  > 

“  pB  <  (t^chalh  ^chall)>  ^P^  ^  (P^7  ^Tl^chall)- 

Fig.  3.  The  protocol  for  key  generation. 
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B.2  Functionality 


The  ideal  functionality  J^keyGen 

Generation: 

1.  The  functionality  waits  for  seeds  {si}i^A  from  the  adversary,  and  samples  a  seed  Si  for  every  honest  player  Pi. 

2.  It  computes  a;  ^  Ws;  (?i ,  for  all  i. 

3.  It  computes  Si  <—  HWTs^  {h,  (f>{m)),  ti  ^  PGsi  bi  <—  [a  ■  Si  +  p  ■  for  all  i. 

4.  It  computes  6  ^  6i  +  •  •  •  +  bn- 

5.  It  computes  enc'  ^  Encpf(— pi  •  Si,7?.Csi  (0.5,  (;/)(m)))  for  every 

6.  It  computes  enc'  ^  enc'i  +  •  •  •  +  tnc'„. 

7.  It  computes  jero^  ^  Encpe(0,  (0.5,  ,  4>{m)))  for  every  Pi,  and  enci  ^  (s;  •  enc')  +  ^eto^  for  all  players  Pi. 

8.  It  leaks  ai,  bi,  enc(,  enci  for  all  i  to  the  adversary  and  waits  for  either  Proceed,  Cheat,  or  Abort. 

Proceed:  The  functionality  sends  Si,  pf  =  (a,  b),  epi  =  (pf,  enc)  to  Pi,  for  all  i. 

Cheat:  On  input  Cheat,  with  probability  1  —  1/c  the  functionality  leaks  NoSuccess  and  goes  to  “Abort”;  otherwise: 

1.  It  leaks  the  seeds  of  the  honest  parties  and  sends  Success  to  the  adversary. 

2.  It  repeats  the  following  loop: 

-  It  waits  for  the  adversary  to  input  values  a*,  (resp.  {s*  jb*),  enc)*,  enc*)  for  i  G  A. 

-  It  overwrites  ai,  (resp.  (si,  bi),  enc',  enci)  for  i  £  A. 

-  It  recomputes  a  (resp.  b,  enc',  enc)  accordingly. 

3.  It  waits  for  Proceed  or  Abort. 

Abort: 

1.  The  functionality  leaks  the  seeds  of  the  honest  parties  if  it  never  did  so. 

2.  It  then  waits  for  a  set  S'  C  A,  sends  it  to  the  honest  players,  and  aborts. 

Fig.  4.  The  ideal  functionality  for  key  generation. 


B.3  Proof  of  Theorem  1 

Proof.  We  build  a  simulator  SkeyGen  to  work  on  top  of  the  ideal  functionality  (FkeyGen,  such  that  the  environment 
cannot  distinguish  whether  it  is  playing  with  the  protocol  TTkeyGen  and  (Fcommit,  or  the  simulator  and  (FkeyGen-  The 
simulator  is  given  in  Figure  6. 

We  now  proceed  with  the  analysis  of  the  simulation.  Let  A  denote  the  set  of  players  controlled  by  the  adversary. 
In  steps  1  and  2  the  simulator  sends  random  handles  to  the  adversary,  as  it  would  happen  in  the  real  protocol.  In  steps 
3-1 1  the  simulation  is  perfect  for  all  the  threads  where  the  simulator  knows  the  seeds  of  the  honest  players,  since 
those  are  generated  as  in  the  protocol.  In  case  of  no  cheat  nor  abort  the  simulation  is  also  perfect  for  the  thread  where 
the  simulator  does  not  know  the  seeds  of  the  honest  players,  since  the  simulator  forwards  honest  values  provided  by 
the  functionality.  In  case  of  cheating  at  the  thread  pointed  by  chall,  the  simulator  gets  the  seeds  also  for  the  remaining 
thread  and  will  replace  the  honestly  precomputed  intermediate  values  ai^chaii,  St.chaii,  ^i,chaii,  enc'  enCi_chaii  with  the 
ones  compatible  with  the  deviation  of  the  adversary  -  the  honest  values  computed  after  a  cheat  reflect  the  adversarial 
behaviour  of  the  real  protocol,  so  a  simulated  run  is  again  indistinguishable  from  a  real  run  of  the  protocol. 

Steps  12,  14  are  statistically  indistinguishable  from  a  protocol  run,  since  the  simulator  plays  also  the  role  of 
.^Commit- 

Step  15  needs  more  work:  we  need  to  ensure  that  the  success  probability  in  a  simulated  run  is  the  same  as  the  one 
in  a  real  run  of  the  protocol.  If  the  adversary  does  not  deviate,  the  protocol  succeeds.  The  same  applies  for  a  simulated 
run,  since  the  simulator  goes  through  “Pass”  at  every  stage.  More  in  detail,  the  simulator  sampled  and  computed  all  the 
values  at  the  threads  not  pointed  by  the  challenge  as  in  a  honest  run  of  the  protocol,  while  values  at  the  thread  pointed 
by  the  challenge  are  correctly  evaluated  and  sent  to  the  honest  players  by  (F^eyGen-  In  case  the  adversary  cheats  only 
on  one  thread,  in  a  real  execution  of  the  protocol  the  adversary  succeeds  in  the  protocol  with  probability  1/c;  the  same 
holds  in  a  simulated  run,  since  the  simulator  goes  through  Cheat  in  CheatSwitch  once  and  the  functionality  leaks 
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Success,  which  happens  with  the  same  probability,  and  later  the  simulator  will  not  abort.  If  the  adversary  deviates  on 
more  than  one  branch  (considering  all  stages),  both  the  real  protocol  and  the  simulation  will  abort  at  step  15. 

Finally,  if  the  protocol  aborts  due  to  failure  at  opening  commitments,  both  the  functionality  and  the  players  output 
the  numbers  of  corrupted  players  who  failed  to  open  their  commitments.  If  the  protocol  aborts  at  step  15,  the  output  is 
the  numbers  of  players  who  deviated  in  threads  other  than  chall  in  both  the  functionality  and  the  protocol.  □ 

The  procedure  CheatSwitch 

Pass:  Checks  passed  for  all  threads  or  there  was  a  successful  cheat  in  earlier  stages  and  checks  now  passed  for  at  least  all 
threads  except  for  the  one  pointed  by  chall. 

-  The  simulator  continues. 

Cheat:  There  was  no  cheat  in  previous  stages  and  all  checks  now  passed  except  for  the  ones  pointed  by  a  unique  thread  j. 

-  The  simulator  adds  (j,  Pi)  to  a  list  L,  for  all  players  Pi  making  the  check  not  pass. 

-  The  simulator  sends  Cheat  to  PkeyGen,  and  gets  and  stores  every  honest  seeds  Si^chaii- 

-  If  the  functionality  sends  NoSuccess: 

•  If  i  =  chall,  it  resamples  a  different  chall  ^  {1, . . . ,  c}  \  {chall}. 

It  continues  the  simulation  according  to  the  protocol. 

-  If  the  functionality  sends  Success:  the  simulator  sets  chall  =  j  and  continues  according  to  the  protocol. 

Abort:  In  more  than  one  thread  checks  did  not  pass  (counting  also  checks  in  previous  stages). 

-  The  simulator  adds  (j,  Pi)  to  a  list  L,  for  all  branches  j  and  players  Pi  making  the  check  not  pass. 

-  The  simulator  sends  Abort  to  PkeyGen  and  it  continues  the  rest  of  the  simulation  according  to  the  protocol. 

Fig.  5.  The  cheat  switch. 
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The  simulator  <SkeyGen 

Initialize: 

-  In  Step  1  the  simulator  obtains  Ci  hy  every  corrupt  Pi,  and  broadcasts  rf  as  JFcommit  would  do.  It  samples  chall 
uniformly  in  {1, . . . ,  c},  and  it  broadcasts  a  handle  rf  for  every  honest  Pi. 

-  In  Step  2  the  simulator  sees  the  random  values  Sij  for  i  G  A. 

It  inputs  {si,chaii}ig^  to  J^keyGen,  therefore  obtaining  a  full  transcript  of  the  thread  corresponding  to  chall. 

For  the  threads  j  7^  chall,  for  honest  Pi,  the  simulator  samples  Sij  honestly  and  broadcasts  a  handle  r/j  for  every 
honest  Pi  for  every  thread. 

-  In  Step  3,  the  simulator  computes  aij  honestly  for  i  ^  A  and  j  7^  chall,  while  it  defines  Ui^chaii  for  i  ^  A  as  the 
values  Oi  obtained  from  the  transcript  given  by  JFkeyGen. 

It  then  broadcasts  aij  for  honest  Pi  and  waits  for  broadcasts  aij  by  the  corrupt  players,  and  it  checks  aij  = 
j  {q,  (pim))  for  all  dishonest  Pi.  For  this  check  the  simulator  enters  CheatSwitch.  If  there  was  a  successful  cheat 
on  the  thread  pointed  by  chall,  the  simulator  inputs  ai,chaii  to  JFkeyGen  for  i  G  A. 

Stage  I: 

-  In  Step  4  the  simulator  acts  as  in  the  protocol. 

-  In  Step  5  for  all  the  honest  seeds  that  are  known  by  the  simulator,  the  simulator  computes  bij  honestly  for  i  ^  A. 

If  the  simulator  does  not  know  the  seeds  Si^chaii  for  honest  Pi,  it  defines  fei^chaii  for  i  ^  A  a&  the  values  bi  obtained 
from  the  transcript  given  by  JFkeyGen- 

It  then  broadcasts  bij  for  honest  Pi  and  waits  for  broadcasts  bij  by  the  corrupt  players.  It  then  checks  bij  = 
[fflchaii  •  TiWTa.  {h,  4>{m))  +  p  ■  TZCsi  ^1,311  4>{m))]g^  for  all  dishonest  Pi.  For  this  check  the  simulator  enters 

CheatSwitch.  If  there  was  a  successful  cheat  on  the  thread  pointed  by  chall,  the  simulator  inputs  (Si, chain  hi.chaii  to 
.^keyGen  for  i  G  A. 

Stage  2: 

-  In  Step  6  the  simulator  acts  as  in  the  protocol. 

-  In  Step  7  for  all  the  honest  seeds  that  are  known  by  the  simulator,  the  simulator  computes  tnc'i  j  honestly  for  i  ^  A. 
If  the  simulator  does  not  know  the  seeds  Si,chaii  for  honest  Pi,  it  defines  enc'  chaii  for  i  ^  Aas  the  values  enc^  obtained 
from  the  transcript  given  by  JFkeyGen- 

It  then  broadcasts  tnt'i  j  for  honest  Pi  and  waits  for  broadcasts  euc^ ^  by  the  corrupt  players.  It  then  checks  enc^  j  = 
Encpe(— Pi  •  Si,j,TZCsi  (0.5,  cr^,  (f>{m)))  for  all  dishonest  Pi.  For  this  check  the  simulator  enters  CheatSwitch.  If 
there  was  a  successful  cheat  on  the  thread  pointed  by  chall,  the  simulator  inputs  enc(  chaii  to  JFkeyGen  for  i  £  A. 

Stage  3: 

-  In  Step  8,  9  the  simulator  acts  as  in  the  protocol. 

-  In  Step  10  for  all  the  honest  seeds  that  are  known  by  the  simulator,  the  simulator  computes  enc; ,7  honestly  for  i  ^  A. 
If  the  simulator  does  not  know  the  seeds  Si, chaii  for  honest  Pi,  it  defines  enci_chaii  for  i  ^  Aas  the  values  enci  obtained 
from  the  transcript  given  by  JFkeyGen- 

It  then  broadcasts  tnuj  for  honest  Pi  and  waits  for  broadcasts  eridj  by  the  corrupt  players.  It  then  checks  enuj  = 
{Sij  ■  enc(  j)  +3et0j  ^  for  all  dishonest  Pi.  For  this  check  the  simulator  enters  CheatSwitch.  If  there  was  a  successful 
cheat  on  the  thread  pointed  by  chall,  the  simulator  inputs  enCi_chaii  to  JFkeyGen  for  i  £  A. 

Output: 

-  Step  1 1  is  performed  according  to  the  protocol. 

-  The  simulator  samples  a  foi  i  ^  A  uniformly  such  that  1  +  c*)  mod  c)  =  chall. 

-  Step  12  is  performed  according  to  the  protocol,  but  the  simulator  opens  rf  revealing  the  values  for  all  honest  Pi, 
and  if  the  check  fails  the  simulator  sends  Abort  to  JF^eygen  and  inputs  the  set  of  all  players  failing  in  opening. 

-  Step  13  is  performed  according  to  the  protocol. 

-  Step  14  is  performed  according  to  the  protocol,  and  if  the  check  fails  the  simulator  sends  Abort  to  JAkeyGen  and 
inputs  the  set  of  all  players  failing  in  opening. 

-  Step  15  is  performed  according  to  the  protocol,  and  the  simulator  defines 

S  =  {iGA\  {j,  Pi)  e  L;  j  e  {1, . . . ,  c};  j  /  chall}  , 

i.e.  the  set  of  corrupt  players  who  cheated  at  any  thread  different  from  chall. 

•  If  S'  7^  0  (i.e.  cheats  at  a  thread  which  is  going  to  be  opened),  the  simulator  sends  Abort  to  JAkeyGen  and  inputs 
S. 

•  If  S  =  0  (i.e.  successful  or  no  cheats),  the  simulator  sends  Proceed  to  JPkeyGen. 

-  Step  16  is  performed  according  to  the  protocol. 

Fig.  6.  The  simulator  for  the  key  generation  functionality. 
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B.4  Semantic  security  of  ^Freygen 


Here  we  prove  the  semantic  security  of  the  cryptosystem  resulting  from  an  execution  of  ^FreyGen,  based  on  the  ring- 
LWE  problem  and  a  form  of  KDM  security  for  quadratic  functions.  The  ring-LWE  assumption  we  use  takes  an  extra 
parameter  h,  as  our  scheme  chooses  binary,  low  hamming  weight,  secret  keys  for  better  efficiency  and  parameter  sizes, 
but  note  that  the  results  here  also  apply  to  secrets  drawn  from  other  distributions. 

Definition  1  (Decisional  Ring  Learning  With  Errors  assumption).  The  single  sample  decisional  ring-LWE  assump¬ 
tion  RLWEq  states  that 

(o,  a  •  s  +  e)  «  (a,  u) 

where  s  ^  HWT{h,  4>{m)),  e  <—  'DQ{a‘^,  and  a,  u  are  uniform  over  Rq. 

The  KDM  security  assumption,  below,  can  be  viewed  as  a  distributed  extension  to  the  usual  key  switching  assump¬ 
tion  for  EHE  schemes.  In  this  case  we  need  ‘encryptions’  of  quadratic  functions  of  additive  shares  of  the  secret  key  to 
remain  secure.  Note  that  whilst  it  is  easy  to  show  KDM  security  for  linear  functions  of  the  secret  [7],  it  is  not  known 
how  to  extend  this  to  the  functions  required  here  without  increasing  the  length  of  ciphertexts. 

n—1 

Definition  2  (KDM  security  assumption).  IfSi  ^  'HWT{h),  s  =  ^  and  f  is  any  degree  2  polynomial  then 

i=0 

(a,a-s+p-e  +  /(so,---,Sn-i))  «  (o,a-s  +  p-e) 
where  a,u  ^  e  ^  'DQ{u^,  <j){m)). 

The  following  lemma  states  that  distinguishing  any  number  of  ‘amortized’  ring-LWE  samples  with  different,  in¬ 
dependent,  secret  keys  but  common  first  component  a,  from  uniform  is  as  hard  as  distinguishing  just  one  ring-LWE 
sample  from  uniform.  It  was  proven  for  the  (standard)  EWE  setting  with  n  =  3  in  [23];  here  we  need  a  version  with 
ring-LWE  for  any  n. 

Lemma  3  (Adapted  from  [23,  Lemma  7.6]).  Suppose  a,Ui  <—  U{q,(j){m)),  Si  ^  HWT{h,4>{m))  and  Ci  ^ 
VQ ,  (j){im))  for  z  =  0, . . . ,  n  —  1,  n  G  N.  Then 


{(a,  a  •  Sj  -I-  ei)}i  «  {(a,  Ui)}i 
under  the  single  sample  ring-LWE  assumption  RLWE^ 

Proof  Suppose  an  adversary  A  can  distinguish  between  the  above  distributions  with  non-negligible  probability.  We 
construct  an  adversary  B  that  solves  the  RLWE  problem.  Given  a  challenge  (a,  b)  from  the  RLWE  oracle,  B  sets  bo  =  b 
and  bi  =  a  ■  Si  Ci  for  i  =  1, ...  ,n  —  1,  where  a  ^  'DQ{a‘^  ,(f){m)),Si  ^  HWT{h,  fim)).  B  sends  all  pairs  (a,  bi) 
to  A  and  returns  the  output  of  A  in  response  to  the  challenge. 

Since  the  values  (a,  bi)  for  z  =  1, . . . ,  n  —  1  are  all  valid  amortized  ring-LWE  samples,  the  only  difference  between 
the  view  of  A  and  that  of  a  real  set  of  inputs  is  bo,  and  so  the  advantage  of  B  in  solving  RLWE^  is  exactly  that  of 
A  in  solving  the  amortized  ring-LWE  problem  with  n  samples.  □ 

Theorem  6  (restatement  of  Theorem  2).  If  the  functionality  T^keyGen  B  used  to  produce  a  public  key  epf  and  se¬ 
cret  keys  Si  for  i  =  0, . . . ,  n  —  1  then  the  resulting  cryptosystem  is  semantically  secure  based  on  the  hardness  of 
RLWEqj  g.2  /j  and  the  KDM  security  assumption. 

Proof.  Suppose  there  is  an  adversary  A  that  can  interact  with  tFREYGEN  and  distinguish  the  public  key  (pf,  ep£)  from 
uniform.  We  construct  an  algorithm  B  that  distinguishes  amortized  ring-LWE  samples  from  uniform.  By  Lemma  3 
this  is  at  least  as  hard  as  breaking  single  sample  ring-LWE.  If  the  public  key  is  pseudorandom  then  semantic  security 
of  encryption  easily  follows,  as  ciphertexts  are  just  ring-LWE  samples.  Note  that  we  only  consider  a  non-cheating 
adversary  -  if  A  cheats  then  it  can  trivially  break  the  scheme  with  non-negligible  probability  1  jc. 
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The  challenger  gives  B  the  values  Oc,  6c, O)  •  ■  • )  bc,n-i-  B  must  now  simulate  an  execution  of  J^ceyGen  with  A  to 
determine  whether  the  challenge  is  uniform  or  of  the  form  (gc,  Oc  •  Si  +  e^)  for  Si  ^  7i.yVT(h,  4>{rn))  and  ^ 

To  start  with  we  receive  the  adversary’s  seeds  Si  for  every  corrupt  player  i  G  A.  We  must  then  simulate  the 
values  ai,bi,  ftic',  enci  (for  all  i)  that  are  leaked  to  the  adversary  in  J^keyGen-  For  corrupt  players  we  simply  compute 
these  values  according  to  JF^eyGen  using  the  adversary’s  seeds.  Next  we  have  to  simulate  the  honest  players’  values, 
which  we  do  using  the  challenge  Oc,  6c, O)  ■  ■  • )  bc,n-i-  First  we  scale  the  challenge  by  p,  so  that  it  takes  the  form 
(ac,  Cc  •  Si  +  p  •  Ci)  if  they  are  genuine  RLWE  samples.  Since  p  is  coprime  to  q  this  still  has  the  same  distribution  as 
the  original  challenge. 

Now  B  calculates  uniform  consistent  shares  ac,i,  for  every  honest  player  Pi,  of  Oc,  and  sends  A  the  pairs  ac,i,  bc,i- 
If  the  challenge  values  are  amortized  ring-LWE  samples,  then  these  are  consistent  with  the  pairs  (oi,  bi)  computed  by 
JFkeyGen,  since  at  is  uniform  and  6i  =  ai  •  Si  +  ei. 

Next,  B  must  provide  A  with  simulations  of  players’  contributions  to  the  key-switching  data  enc' ,  enci  for  all  honest 
players  Pi.  Eor  both  of  these  sets  of  values,  B  simply  re-randomizes  the  pair  (oc,  6c)  and  sends  this  to  A.  This  can  be 
done  by,  for  example,  computing  an  encryption  of  zero  under  the  public  key  (cc,  6c)  (where  6c  =  ^c,i)-  Notice  that 

enc'  is  just  an  encryption  of  —pi  ■  Si  under  the  public  key  (o,  6),  and  so  by  the  KDM  security  assumption  is  (perfectly) 
indistinguishable  from  a  re-randomized  version  of  (a,  6).  Eor  enci,  recall  that  J^ceyGen  computes  enci  =  Si-enc'+jerOj. 
Now  writing  jerOj  =  {a  ■  Vi  +  p  ■  eo,i,  b  ■  Vi  +  p  ■  ei^i)  and  enc'  =  {a  ■  v  +  p  ■  eo,b  ■  v  +  p  ■  ei  —  pi  ■  s),we  see  that 

enci  =  {a  ■  V  ■  5^  +  a  ■  Vi  +  p  ■  {eo  ■  5i  +  eo,i),  6  ■  v  ■  +  b  ■  Vi  +  p  ■  {ei  ■  Si  +  ei,,)  -  pi  ■  s  ■  5i) 

=  a  -  {v  ■  Si  +  Vi)+p-  (eo  •  s*  +  eo,i),a  •  (t;  •  -f  Vi)  -s  +  p  ■  {ei  ■  Si  +  ei,i  -f  e)  -pi  •  s  •  s* 

V- - ^ - '  ' - ^ - '  ' - ^ '  - - V - '  J 

i  i 

=  {ai+P-  e'o,*,  qAs  +  p-  -  pi  ■  s  ■  5i)  . 

Notice  that  the  first  component  of  enc^  corresponds  to  the  second  half  of  a  ring-LWE  sample  {a,a-{v-5i  +  Vi)  +p-eo,i) 
with  secret  v  ■  5i  +  vi.  The  second  component  of  enCi  corresponds  to  a  ring-LWE  sample  with  secret  s  and  first  half 
o',  with  an  added  quadratic  function  of  the  key  —pi  ■  s  ■  5i.  By  the  KDM  security  assumption,  this  is  indistinguishable 
from  a  genuine  ring-LWE  sample,  so  enCi  can  also  be  perfectly  simulated  by  re-randomizing  (oc,  6c). 

To  finish  the  simulated  execution  of  J^ceyGen,  B  sends  A  shares  of  the  secret  key  for  all  Pi  where  i  G  A  (i.e. 
all  dishonest  players),  by  sampling  randomness  using  the  seeds  that  were  provided  to  B  at  the  beginning.  B  then 
waits  for  A  to  give  an  answer  and  returns  this  in  response  to  the  challenger.  Notice  that  throughout  the  simulation,  all 
values  passed  to  A  were  ring-LWE  samples  derived  from  the  challenge  (oc,  6o,c,  •  ■  • ,  6„_i_c)-  We  showed  that  if  the 
challenge  is  an  amortized  ring-LWE  sample  then  ^’s  input  is  indistinguishable  from  the  output  of  J^ceyGen,  whereas 
if  the  challenge  is  uniform  then  so  is  ^’s  input.  Therefore  if  A  is  successful  in  distinguishing  the  resulting  public  key 
from  uniform  then  A  must  have  solved  the  ring-LWE  challenge. 

□ 
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C  EncCommit:  Protocol,  Functionalities  and  Security  Proofs 


C.l  Protocol 


Protocol  -/TencCommit 

Usage:  The  specific  distribution  of  the  message  is  defined  by  the  input  parameter  cond.  The  output  is  a  single  message  mi 
private  to  each  player,  and  a  public  ciphertext  Ci  from  player  i.  The  protocol  runs  in  two  phases;  a  commitment  phase  and 
an  opening  phase. 

KeyGen:  The  players  execute  TTkeyGen  to  obtain  Si,  pf,  and  tpt. 

Commitment  Phase: 

1.  Every  player  Pi  samples  a  uniform  a  ^  {1, . . . ,  c},  and  queries  Commit(ei)  to  JFcommit,  which  broadcasts  a  handle 

2.  For  j  =  1, . . .  ,c 

(a)  Every  player  Pi  samples  a  seed  Sij  and  queries  Commit(sij)  to  Pcommit,  which  broadcasts  a  handle 

(b)  Every  player  Pi  generates  mij  according  to  cond  using  PRFa-  ^ . 

(c)  Every  player  Pi  computes  and  broadcasts  aj  ^  Encpt  (mi j)  using  PRFs.  .  to  generate  the  randomness. 

3.  Every  player  Pi  calls  JFcommit  with  Open(Tf).  All  players  get  d.  If  any  opening  failed,  the  players  output  the 
numbers  of  the  respective  players,  and  the  protocol  aborts. 

4.  All  players  compute  chall  ^  1  +  ((X]r=i  tnod  c). 

Opening  Phase: 

5.  Every  player  Pi  calls  JFcommit  with  Open(T/j  )  for  all  y  7^  chall  so  that  all  players  obtain  the  value  Sij  for  y  7^  chall. 
If  any  opening  fails,  the  players  output  the  numbers  of  the  respective  players,  and  the  protocol  aborts. 

6.  For  all  j  7^  chall  and  all  i'  <  n,  the  players  check  whether  Ci'j  was  generated  correctly  using  Si/j.  If  not,  they 
output  the  numbers  of  the  respective  players  i',  and  the  protocol  aborts. 

7.  Otherwise,  every  player  Pi  stores  {Ci\chs»}i'<n  and  mi^chaii- 

Fig.  7.  Protocol  that  allows  ciphertext  to  be  used  as  commitments  for  plaintexts 


C.2  Functionalities 
C.3  Proof  of  Theorem  3 

Proof.  We  construct  a  simulator  5she  (see  Figure  9)  working  on  top  of  JFshe  such  that  the  environment  can  not 
distinguish  whether  it  is  playing  with  the  real  protocol  ilENcCoMMiT  and  JFkeyGen  or  with  JFshe  and  5she-  The  simulator 
is  given  in  Figure  9. 

Calls  to  JFkeyGen  are  simulated  as  in  5keyGen.  We  now  focus  on  the  commitment  phase. 

Let  A  be  the  set  of  indices  of  corrupted  players.  The  simulator  starts  assuming  that  the  adversary  will  behave 
honestly.  It  samples  a  uniform  jo  ^  {1?  •  •  •  ?  c}  and  seeds  adversary  does  not  deviate,  then  round 

jo  will  remain  unopened,  otherwise  the  simulator  will  have  to  adjust  this.  We  can  simulate  each  round  j  as  follows. 
First,  the  simulator  gets  corrupted  seeds  Sij  for  i  G  A  when  the  adversary  commits  to  them  in  step  2a.  It  gives  in 
return  random  handles  on  behalf  of  each  honest  player  Pi. 

If  j  7^  jo,  the  simulator  engages  with  the  adversary  in  a  normal  run  of  steps  2b  to  2c  using  seeds  Sij  for  honest 
player  Pi.  Since  the  simulator  knows  the  corrupt  seeds  of  the  current  round  j,  it  can  check  whether  the  adversary 
behaved  honestly.  If  the  adversary  did  not,  then  the  simulator  stores  index  j  in  the  cheating  list. 

If  j  =  jo,  the  simulator  checks  again  whether  the  adversary  computed  the  right  encryptions  {ci}i^A-  If  it  did  not, 
the  simulator  stores  jo  in  the  cheating  list.  Then  the  simulator  calls  EncCommit  to  JFshe  on  seeds  {sijg}i^A  and 
gets  back  which  are  the  values  computed  by  the  functionality.  It  then  sets  Cijg  ^  Ci  and  pass  them  onto  the 

adversary  in  step  2c. 

Once  the  last  round  is  finished,  the  simulator  checks  the  cheating  list.  There  are  three  possibilities: 
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The  ideal  functionality  J^she 

Usage:  The  functionality  is  split  into  a  one-run  stage  which  computes  the  key  material  and  a  stage  which  can  be  accessed 
several  times  and  is  targeted  to  replace  the  zero-knowledge  protocols  in  [14]. 

KeyGen:  On  input  Key  Gen  the  functionality  acts  as  a  copy  of  JFkeyGen- 

Notice  that  all  the  variables  used  during  this  call  are  available  for  later  use. 

EncCommit:  On  input  EncCommit  the  functionality  does  the  following. 

Initialize:  Denote  by  A  the  set  of  indices  of  corrupt  players.  On  input  Start  by  all  players,  sample,  at  random,  seeds 
{si}i^A  and  wait  for  corrupted  seeds  {si}i^A  from  the  adversary. 

Computation: 

1.  It  sets  mi  ^  PRFsj  subject  to  condition  cond. 

2.  It  sets  Ci  =  Encpt  (mi,  TJCs;  (0.5,  for  each  player  Pi. 

3.  It  gives  {ci}i^A  to  the  adversary,  and  waits  for  signal  Deliver,  Cheat  or  Abort. 

Delivery:  The  functionality  sends  mi,  {cj}j<ri  to  player  Pi. 

Cheat:  The  functionality  gives  {siji^A  to  the  adversary,  then  it  decides  to  do  either  of  the  following  things: 

-  With  probability  1/c  it  sends  Success  to  the  adversary,  it  waits  for  {mi,  CijigA,  and  outputs  mi,  {cj}i<n  to 
player  Pi. 

-  Otherwise  sends  NoSuccess  to  the  adversary,  and  goes  to  abort. 

Abort:  The  functionality  waits  for  the  adversary  to  input  S'  C  A,  and  outputs  S  to  all  players. 

Fig.  8.  The  ideal  functionality  for  key  generation  and  Penccommit. 


-  The  list  is  empty.  In  other  words,  the  adversary  behaved  honestly.  The  simulator  sets  chall  ^  jo,  and  sends  Deliver 
to  the  functionality  if  all  commitments  are  successfully  opened.  The  output  of  JFshe  and  what  the  adversary  has 
already  seen  seen  will  be  consistent  since  J^she  was  called  in  round  jo  with  the  right  seeds  {sijg}i^A- 

-  The  list  contains  only  one  index  ji.  In  this  case  the  simulator  sends  Cheat  and  gets  in  return  seeds  {si]i^A  used 
by  the  functionality.  It  sets  ^  Si  for  each  honest  player  Pi.  It  then  waits  for  the  answer. 

•  If  the  functionality  returns  Success,  the  simulator  has  to  make  the  adversary  believe  that  round  ji  will  remain 
unopened.  It  sets  chall  ^  ji.  If  all  commitments  are  successfully  opened,  it  sends  {niij^,  Cij^}i^A  to  the 
functionality  in  order  to  make  consistent  players’  outputs  and  what  the  adversary  has  already  seen. 

•  If  it  returns  NoSuccess,  the  simulator  has  to  make  the  adversary  believe  that  round  ji  will  be  opened.  Therefore 
it  samples  chall  ^  {1, . . . ,  c}  \  {ji}. 

-  The  list  contains  at  least  two  indices  ji,  j2-  In  this  case  the  real  protocol  would  result  in  abort,  so  the  simulator 
sends  Abort  to  the  functionality  and  sets  chall  ^  jg. 

Later  the  simulator  generates  the  value  Ci  for  each  honest  player  such  that  1  +  ((X]r=i  niod  c)  =  chall.  This 
ensures  that  once  the  challenge  is  computed,  it  will  point  to  a  round  in  the  same  fashion  as  the  protocol  would  do. 
Moreover,  opening  rf  to  (any)  Ci  is  does  not  give  clues  to  the  adversary  if  it  is  playing  in  a  real  run  of  the  protocol  or 
in  a  simulated  one. 

In  the  opening  phase,  the  simulator  gives  {ei}i^A  ^nd  honest  share  to  the  adversary,  and  if  it  there 

was  a  cheating  with  no  success,  then  it  also  sends  Abort  on  behalf  of  each  honest  player. 

It  is  clear,  from  the  construction  of  J^she,  that  all  the  messages  generated  by  the  simulator  are  indistinguishable 
from  a  real  run  of  the  protocol.  The  simulator  does  the  same  computations,  except  in  round  jg  where  the  computation 
is  done  by  the  functionality,  and  the  values  are  then  passed  onto  the  simulator,  which  forwards  them  to  the  adversary. 

Finally,  if  the  protocol  aborts  due  to  failure  at  opening  commitments,  both  the  functionality  and  the  players  output 
the  numbers  of  corrupted  players  who  failed  to  open  their  commitments.  If  the  protocol  aborts  at  step  6,  the  output  is 
the  numbers  of  players  who  deviated  in  threads  other  than  chall  in  both  the  functionality  and  the  protocol.  □ 
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The  simulator  <Sshe 

KeyGen:  iSshe  acts  as  iSkeyGen,  but  5she  calls  IFshe  on  query  KeyGen,  when  5keyGen  would  have  called  I^^keyGen- 

Commitment  Phase: 

-  The  simulator  chooses  random  Jo  ^  and  seeds 

-  Acting  as  the  IFcommit  functionality,  in  response  to  query  in  step  1  and  2a,  for  j  =  1, . . . ,  c  the  simulator  samples 
Si  j-  according  to  the  protocol  for  i  ^  A  and  returns  random  handles  {  Ti}i<n,{T!.j}i<r.- 

-  For  j  —  1, . . . ,  c,  the  simulator  does  the  following: 

•  If  i  7^  jo,  it  performs  steps  2b  and  2c  according  to  protocol  using  honest  seeds  Sij  for  each  i  ^  A. 

•  If  J  —  jo,  it  calls  i^sHE  on  query  EncCommit  on  corrupted  seeds  {si  }igA  and  gets  back  honest  encryptions 
{ciji^A-  It  then  sets  ajg  ^  d  for  each  i  ^  A. 

-  In  step  2c,  the  simulator  receives  encryptions  Cij  for  each  i  G  A  and  j  €  {1, . . . ,  c}.  It  generates  niij  subject  to 
cond,  and  Ci,j  ^  Encpf  (rtiiy ),  and  checks  if  aj  =  c^j.  If  the  equality  does  not  hold,  it  stores  j  in  a  (cheating)  list. 

-  The  simulator  reads  the  cheating  list.  There  are  three  possibilities: 

•  The  list  is  empty.  The  simulator  sets  chall  ^  jo- 

•  The  list  contains  only  one  index  ji.  The  simulator  sends  Cheat  to  JFshe  and  gets  {si}i^A  back.  It  then  sets 
Si  Jo  ^  Si  for  each  i  ^  A. 

*  If  the  functionality  returns  Success,  the  simulator  sets  chall  ^  ji. 

*  If  the  functionality  returns  NoSuccess,  the  simulator  samples  chall  «—  {1, . . . ,  c}  \  {jr}. 

•  The  list  contains  at  least  two  indices.  The  simulator  sends  Abort  to  JFshe,  gets  {si}i^A  and  sets  Sijg  <—  Si  for 
each  i  ^  A,  and  chall  ^  jo. 

-  For  all  honest  Pi  the  simulator  sets  a  uniformly  in  1, . . . ,  c  with  the  constraint  1  +  Ci)  mod  c)  =  chall. 

-  In  step  3,  the  simulator  opens  the  handle  rf  to  the  freshly  defined  value  a,  for  all  honest  Pi.  If  the  adversary  fails 
to  open  some  of  the  commitments  of  corrupted  players,  the  simulator  sends  Abort  and  the  numbers  of  the  respective 
players  to  JAshe,  and  it  stops. 

-  Step  4  is  performed  according  to  the  protocol. 

Opening  Phase: 

-  In  step  5,  the  simulator  opens  the  handle  rfj  to  Sij  for  all  honest  players  i  (f:  A  and  j  7^  chall.  If  the  adversary  fails 
to  open  some  of  the  commitments  of  corrupted  players,  the  simulator  sends  Abort  and  the  numbers  of  the  respective 
players  to  JAshe,  and  it  stops. 

-  If  the  cheating  list  is  empty,  the  simulator  sends  Deliver  to  JAshe. 

-  If  the  functionality  returned  Success  earlier,  the  simulator  inputs  {mi.chaii,  t*chaii}ieA  to  the  functionality. 

-  If  the  functionality  returned  NoSuccess,  or  if  the  cheating  list  has  at  least  two  indices,  the  simulator  inputs  to  the 
functionality  the  number  of  players  i  £  A  whose  c*j  were  computed  incorrectly  for  some  j  7^  chall. 

Fig.  9.  The  simulator  for  JAshe 
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D  Offline  Phase  :  Protocol,  Functionalities  and  Simulators 

D.l  Protocols 


Protocol  MACCheck 

Usage:  Each  player  has  input  ai  and  {'y{aj)i)  for  j  =  1, ...  ,t.  All  players  have  a  public  set  of  opened  values  {ai, . . . ,  at}; 

the  protocol  either  succeeds  or  outputs  failure  if  an  inconsistent  MAC  value  is  found. 

MACCheck({ai, . . . ,  at}): 

1.  Every  player  Pi  samples  a  seed  s;  and  asks  JAcommit  to  broadcast  r/  ^  Commit(si). 

2.  Every  player  Pi  calls  with  Open(T/)  and  all  players  obtain  Sj  for  all  j. 

3.  Set  s  ^  Si  ©  •  •  •  ©  s„. 

4.  Players  sample  a  random  vector  r  =  Ua{p,t);  note  all  players  obtain  the  same  vector  as  they  have  agreed  on  the  seed 

s. 

5.  Each  player  computes  the  public  value  a  ^  - 

6.  Player  i  computes  71  ^  ’’’j  '  and  Oi  ^  ^i  —  ai  ■  a. 

7.  Player  i  asks  JAcommit  to  broadcast  rf  ^  Commit((Ti). 

8.  Every  player  calls  JAcommit  with  Open(r°^),  and  all  players  obtain  aj  for  all  j. 

9.  If  (7i  +  •  •  •  +  7^  0,  the  players  output  0  and  abort. 

Fig.  10.  Method  To  Check  MACs  On  Partially  Opened  Values 


Protocol  Reshare 

Usage:  Input  is  Cm,  where  Cm  =  Encpe(m)  is  a  public  ciphertext  and  a  parameter  enc,  where  enc  =  New/Ciphertext  or 
enc  =  NoNew/Ciphertext.  Output  is  a  share  uii  of  m  to  each  player  Pp,  and  if  enc  =  NewCiphertext,  a  ciphertext 
Cm.  The  idea  is  that  Cm  could  be  a  product  of  two  ciphertexts,  which  Reshare  converts  to  a  “fresh”  ciphertext  c^.  Since 
Reshare  uses  distributed  decryption  (that  may  return  an  incorrect  result),  it  is  not  guaranteed  that  Cm  and  c^  contain  the 
same  value,  but  it  is  guaranteed  that  uii  is  the  value  contained  in  Cm- 
Reshare(cni,  enc)  : 

1.  The  players  run  T^she  on  query  EncCommit(7?p)  so  that  player  i  obtains  plaintext  U  and  all  players  obtain  C7,  an 
encryption  of  f; . 

2.  The  players  compute  cr  ^  cr^  +  •  •  •  +  Cf„,  and  Cm+f  ^  Cm  +  Cf.  We  define  f  =  fi  +  •  •  •  +  f„,  although  no  party 
can  compute  f . 

3.  The  players  invoke  Protocol  DistDec  to  decrypt  Cm+f  and  thereby  obtain  m  +  f . 

4.  Pi  sets  mi  ^  m  +  f  —  fi,  and  each  player  Pi  (i  7^  1)  sets  m;  « - fi. 

5.  If  enc  =  NewCiphertext,  all  players  set  ^  Encpf(m  +  f)  —  Cf^  —  •  •  •  —  Cf„,  where  a  default  value  for  the 
randomness  is  used  when  computing  Encpt(m  +  f). 

Fig.  11.  The  Protocol  For  Additively  Secret  Sharing  A  Plaintext  m  £  On  Input  A  Ciphertext  Cm  =  Encpe(m). 
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Protocol  ilpREp 

Usage:  Note  that  DataGeneration  can  be  run  in  four  distinct  threads,  and  DataCheck  in  two  threads  with  one  thread  exe¬ 
cuting  the  Square  and  Shared  bit  checking  at  the  same  time.  Each  thread  executes  its  own  check  for  correct  broadcasting 
using  Section  3.1. 

Initialize:  This  produces  the  keys  for  encryption  and  MACs.  On  input  (Start,  p)  from  all  the  players: 

1.  The  players  call  J^she  on  query  KeyGen  so  player  i  obtains  (Si,  p6,  enc). 

2.  The  players  call  J^she  on  query  EncCommit(Fp)  so  player  y  obtains  a  share  aj  of  the  MAC  key,  and  all  players  get 
a,  and  encryption  of  Oi,  for  1  <  i  <  n. 

3.  All  players  set  Cq,  ^  ci  +  •  •  •  +  c„. 

Data  Generation:  On  input  (DataGen,  n/,  rim,  ris,  «&),  the  players  execute  the  following  subprocedures  of  DataGen  from 
Figure  13  and  Figure  14: 

1.  InputProduction(nr) 

2.  Triples(nm) 

3.  Squares(ns) 

4.  Bits(ni,) 

Data  Check:  On  input  DataCheck,  the  players  do  the  following: 

1.  Generate  two  random  values  tm,  tsb  running  the  steps  below  twice: 

(a)  Every  player  Pi  samples  random  ti  <—  Fp  and  asks  JFcommit  to  broadcast  r*  ^  Commit(fi). 

(b)  Every  player  Pi  calls  J- commit  with  Open(r/)  and  all  players  obtain  tj  for  1  <  j  <  n. 

(c)  Every  player  sets  f  <—  ti  If  t  =  0,  then  repeat  the  previous  steps. 

2.  Execute  DataCheck(fm,  tsb). 

Finalize:  For  the  set  of  partially  opened  values  run  protocol  MACCheck  from  Figure  10. 

Abort:  If  JFshe  outputs  a  set  S  of  corrupted  players  at  any  time,  all  players  output  S,  and  the  protocol  aborts. 

Fig.  12.  The  Preprocessing  Phase 
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Procedure  DataGen 

Input  Production:  This  produces  at  least  m  ■  n  shared  values  ri  j  for  1  <  i  <  m  and  1  <  j  <  n  such  that  player  j  holds 
the  actual  value  ri  j  and  all  other  players  hold  a  sharing  of  this  value  only. 

1.  For  j  e  {1, . . .  ,n}  and  fc  G  {1, . . . ,  [2  • 

(a)  Player  ji  generates  r  G  Rp. 

(b)  Player  j  computes  c  ^  Encpt  (r)  and  broadcasts  the  ciphertext  to  all  players. 

(c)  The  parties  execute  Reshare(c,  NoNewCiphertext)  so  that  player  i  obtains  the  share  r;  of  r 

(d)  All  parties  compute  c.y(r)  ^  Cr  •  Ca. 

(e)  The  parties  execute  Reshare(c.y(r),  NoNewCiphertext).  to  obtain  shares  7(r)i. 

(f)  Player  i  decomposes  the  plaintext  elements  and  7(r)i  into  their  m/2  slot  values  via  the  FFT  and  locally 
stores  the  resulting  data. 

(g)  Player  /  does  the  same  with  r  to  obtain  the  values  for  i  =  1, . . . ,  m/2. 

Triples:  This  produces  at  least  2  •  rim.  (/-shared  values  (o^,  bj,Cj)  such  that  Cj  =  aj  ■  bj. 

1.  For  fc  G  {1, . . . ,  I'd  •  Jirn/m]}. 

(a)  The  players  run  JFshe  on  query  EncCommit(i?p)  so  that  player  i  obtains  plaintext  and  all  players  obtain  Cg,. 
an  encryption  of  a^. 

(b)  The  players  compute  Ca  ^  Cai  -h  •  •  •  +  Ca„  We  define  a  =  ai  +  •  •  •  -f  a„,  although  no  party  can  compute  a. 

(c)  The  players  run  JFshe  on  query  EncCommit(i?p)  so  that  player  i  obtains  plaintext  and  all  players  obtain  Cb^ 
an  encryption  of  bi . 

(d)  The  players  compute  Cb  ^  Cbi  +  •  •  •  -h  Cb„  We  define  b  =  bi  +  •  •  •  +  b„,  although  no  party  can  compute  b. 

(e)  All  parties  compute  Ca  b  ^  Ca  •  Cb. 

(f)  The  parties  execute  Reshare(ca  b,  NewCiphertext)  so  that  player  i  obtains  the  share  Ci  and  all  players  obtain  a 
ciphertext  Cc  encrypting  the  plaintext  c  =  ci  +  •  •  •  +  c„. 

(g)  All  parties  compute  c.y(a)  ^  Ca  •  Ca,  c.y(b)  ^  Cb  •  Ca  and  c.y(c)  <—  Cc  •  Ca- 

(h)  The  parties  execute  Reshare(c.y(a) ,  NoNewCiphertext),  Reshare(c.y(b),  NoNewCiphertext)  and 
Reshare(c.^(c) ,  NoNewCiphertext)  to  obtain  shares  7(a)i,  7(b)i  and  7(0)^. 

(i)  Player  i  decomposes  the  various  plaintext  elements  into  their  m/2  slot  values  via  the  FFT  and  locally  stores  the 
resulting  m/2  multiplication  triples. 

Fig.  13.  Production  Of  Tuples  and  Shared  Bits 
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Procedure  DataGen 

Squares:  This  produces  at  least  {2  ■  Us  +  Ub)  (•)-shared  values  {aj,bj)  such  that  bj  =  aj  ■  aj. 

1.  For  fc  e  {1, . . . ,  |'2  •  (2  •  Jis  + 

(a)  The  players  run  JFshe  on  query  EncCommit(_Rp)  so  that  player  i  obtains  plaintext  a.i  and  all  players  obtain  Ca^ 
an  encryption  of  a^. 

(b)  The  players  compute  Ca  ^  Cai  +  •  •  •  +  Ca„  We  define  a  =  ai  +  •  •  •  +  a„,  although  no  party  can  compute  a. 

(c)  All  parties  compute  Ca2  ^  Ca  •  Ca- 

(d)  The  parties  execute  Reshare(Ca2 ,  NewCiphertext)  so  that  player  i  obtains  the  share  hi  and  all  players  obtain  a 
ciphertext  Cb  encrypting  the  plaintext  b  =  bi  +  •  •  •  +  b„. 

(e)  All  parties  compute  c^(a)  ^  Ca  •  Ca  and  c^(b)  <—  Cb  •  Ca. 

(f)  The  parties  execute  Reshare(c-y(a) ,  NoNewCiphertext)  and  Reshare(c-y(b),  NoNewCiphertext)  to  obtain  shares 
7(a)i  and  7(b)i. 

(g)  Player  i  decomposes  the  various  plaintext  elements  into  their  m/2  slot  values  via  the  FFT  and  locally  stores  the 
resulting  m/2  squaring  tuples. 

Bits:  This  produces  at  least  nb  (•)-shared  values  bj  such  that  bj  £  {0, 1}. 

1.  For  fc  £  {1, . . . ,  [2  •  nblrn  \  +  1}.“ 

(a)  The  players  run  JFshe  on  query  EncCommit(i?p)  so  that  player  i  obtains  plaintext  and  all  players  obtain  Ca^ 
an  encryption  of  a^. 

(b)  The  players  compute  Ca  ^  Cai  +  •  •  •  +  Ca„  We  define  a  =  ai  +  •  •  •  +  a„,  although  no  party  can  compute  a. 

(c)  All  parties  compute  c^2  ^  Ca  •  Ca- 

(d)  The  players  invoke  protocol  DistDec  to  decrypt  0^2  and  thereby  obtain  s  =  a^. 

(e)  If  any  slot  position  in  s  is  equal  to  zero  then  set  it  to  one.  . 

(f)  A  fixed  square  root  t  of  s  is  taken,  say  the  one  for  which  each  slot  position  is  odd  when  represented  in  [1 , . . . ,  p) . 

(g)  Compute  Cv  ^  •  Ca,  this  is  an  encryption  of  v  =  •  a,  which  is  a  message  for  which  each  slot  position 

contains  {  —  1, 1},  bar  the  one  which  we  replaced  in  step  (le). 

(h)  All  parties  compute  c.y(v)  ^  Cv  •  Ca- 

(i)  The  parties  execute  Reshare(cv,  NoNew/Ciphertext)  and  Reshare(c.y(v),  NoNew/Ciphertext)  to  obtain  shares 
Vi  and  7(v)i. 

(j)  Player  i  decomposes  the  various  plaintext  elements  into  their  slot  values  via  the  FFT,  bar  the  ones  replaced  in 
step  (le)  to  obtain  (vj)  for  j  =  1, B  where  B  «  m  •  (p  —  l)/(2  •  p). 

(k)  Set  (bj)  ^  (1/2)  •  {{vj)  +  1)  and  output  (bj). 

“  Notice  that  in  the  production  of  shared  bits  the  number  of  rounds  is  one  more  than  one  would  expect  at  first  glance:  this 
is  because  some  entry  of  the  input  vector  may  be  equal  to  zero,  making  such  entry  unusable  for  the  procedure.  This  event 
happens  with  probability  1  /p,  so  the  expected  number  of  bits  produced  per  iteration  is  m  •  (p  —  1 )  /  (2  •  p) ,  rather  than  m/2 
(if  no  entry  were  zero).  Therefore,  in  order  to  produce  at  least  Ub  elements,  we  add  an  extra  round  to  the  procedure. 

Fig.  14.  Production  Of  Tuples  and  Shared  Bits  (continued) 
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Procedure  DataCheck 

Usage:  Note  that  all  players  have  previously  agreed  on  two  common  random  values  tm,  tsb- 

Checking  Multiplication  Triples:  This  produces  at  least  rim  checked  (•)-shared  values  {aj,  bj,Cj)  such  that  Cj  =  aj  ■  bj. 
1.  For  fc  e  {1, . . .  ,nm}. 

(a)  Take  two  unused  multiplication  tuples  {{a} ,  (b)  ,  (c)),  ((/)  ,  (g)  ,  {h})  from  the  list  determined  earlier. 

(b)  Partially  open  tm  ■  {a}  —  (/)  to  obtain  p  and  {b)  —  (g)  to  obtain  a. 

(c)  Evaluate  tm  ■  (c)  —  (h)  —  a  ■  (/)  —  p  ■  (g)  —  a  ■  p  and  partially  open  the  result  to  obtain  r. 

(d)  If  r  7^  0  then  output  0  and  abort. 

(e)  Output  ((a)  ,  (b)  ,  (c))  as  a  valid  multiplication  triple. 

Checking  Squaring  Tuples:  This  produces  at  least  ns  checked  {•)-shared  values  (aj,bj)  such  that  bj  =  aj. 

1.  For  fc  e  {1, . . . ,  Us}. 

(a)  Take  two  unused  squaring  tuples  ((a) ,  {b}),  ((/)  ,  (h))  from  the  list  determined  earlier. 

(b)  Partially  open  tab  ■  {a}  —  (/)  to  obtain  p. 

(c)  Evaluate  tlf,  ■  (6)  —  {h)  —  p  ■  {tsb  ■  (a)  +  {/))  and  partially  open  the  result  to  obtain  r. 

(d)  If  r  7^  0  then  output  0  and  abort. 

(e)  Output  ((a)  ,  (&))  as  a  valid  squaring  tuple. 

Checking  Shared  Bits:  This  produces  at  least  Ub  checked  {•)-shared  values  bj  such  that  bj  £  {0, 1}. 

1.  For  fc  £  {1, . . . ,  Ub}. 

(a)  Take  an  unused  squaring  tuples  ((/)  ,  (h))  and  an  unused  bit  sharing  (a)  from  the  lists  determined  earlier. 

(b)  Partially  open  tab  ■  {a}  —  (/)  to  obtain  p. 

(c)  Evaluate  •  (a)  —  {h)  —  p  ■  {tab  ■  {a)  +  {/))  and  partially  open  the  result  to  obtain  r. 

(d)  If  r  7^  0  then  output  0  and  abort. 

(e)  Output  (a)  as  a  valid  bit  sharing. 

Fig.  15.  Check  The  Output  Of  The  Data  Production  Procedure 
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D.2  Functionalities 


The  functionality  JFprep 

Let  A  be  the  set  of  indices  of  corrupted  players.  Symbols  in  bold  denote  vectors  in  (Fp)*’.  Arithmetic  is  componentwise. 

Initialize:  On  input  (Start,  p)  from  honest  players  and  adversary,  the  functionality  sets  the  internal  flag  BreakDown  to  false 
and  then  it  does  the  following: 

1 .  For  each  corrupted  player  i  €  A,  the  functionality  accepts  shares  ai  from  the  adversary,  and  it  samples  at  random 
Ui  for  each  i  ^  A.  Then  the  functionality  sets  a  ^  ai  +  •  •  •  +  a„. 

2.  The  functionality  waits  for  signal  Abort,  Proceed  or  Cheat  from  the  adversary. 

3.  If  received  Proceed,  the  functionality  outputs  ai  to  player  i. 

4.  Otherwise,  and  if  the  functionality  did  not  abort  in  Cheat,  it  outputs  adversary’s  contribution  Ai  to  player  i. 
Computation:  On  input  DataGen  from  all  honest  players  and  adversary,  and  only  if  the  functionality  received  Proceed  (or 

BreakDown  is  true)  it  executes  the  data  generation  procedures  specified  in  Figure  17. 

Macro  Angle(vi, . . . ,  v„,  A^,  k)  The  above  will  be  run  by  the  functionality  at  several  points  to  create  representations  (•). 

1.  It  gets  {7i}igA  from  the  adversary. 

2.  Let  V  =  vi  +  •  •  •  +  v„,  set  7(v)  ^  a  •  v  +  A^. 

3.  Sample  at  random  7i(v)  <—  (Fp)*  for  i  ^  A,  subject  to  7(v)  =  Yin  7(v)i. 

4.  Return  (7(v)i, ...  ,7(v)„). 

Cheat:  The  functionality  chooses  to  do  either  one  of  the  following: 

-  It  sends,  with  probability  1/c,  Success  to  the  adversary  and  sets  the  internal  flag  BreakDown  to  true. 

-  Otherwise  it  sends  NoSuccess  to  the  adversary  and  players,  and  goes  to  “Abort”. 

Abort:  The  functionality  waits  for  S'  C  A  from  the  adversary  and  then  outputs  S  to  all  players. 

Fig.  16.  MAC  Generation  and  Covert  Procedures  to  Generate  Auxiliar  Data 
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The  functionality  JFprep  (continued) 

Let  A  be  the  set  of  indices  of  corrupted  players.  Symbols  in  bold  denote  vectors  in  (Fp)*^.  Arithmetic  is  componentwise. 

Input  Production:  On  input  DataType  =  (InputPrep,  nj), 

1.  The  functionality  choose  random  values  I  =  £  (Fp)”^  |  i  ^  A}. 

2.  It  accepts  from  the  adversary  corrupted  values  £  (Fp)"^  |  i  £  A},  corrupted  shares  £  (Fp)"^  |  k  £ 

A,i  <  n},  and  offset  for  data  and  MACs  {A^\  £  (Fp)"^  |  i  <  n}.  Then  it  does  the  following: 

(a)  Sample  honest  shares  |  fc  ^  A,  i  <  n}  subject  to  +  A^'^  =  ^  ■ 

(b)  Run  macro  Angle(rj'\  . . . ,  ,  Ai^\ni),  for  i  <  n. 

(c)  Output  7i(r*''’^))y<n}  to  player  i,  or  if  BreakDown  is  true,  output  adversary’s  contribution  Ai  to 

player  i. 

Multiplication  Triples:  On  input  DataType  =  (Triples,  rim), 

1.  Choose  2  •  rim  honest  shares  I  =  {(ui,  b;)  £  (Fp)^'"™  |  i  ^  A}. 

2.  It  accepts  corrupted  shares  {(ui,  bi,  Ci)  £  (Fp)^'"™  |  i  £  A}  and  MAC  offsets  £  (Fp)^'""*} 

from  the  adversary.  It  performs  the  following: 

(a)  Set  c  ^  (ai  H - +  a„)  •  (bi  H - +  b„). 

(b)  Compute  a  set  of  honest  shares  {ci  |  i  ^  A}  subject  to  c  =  ^17=1 

(c)  Run  the  macros  Angle(ai, . . .  ,a„,  Aj°'\nm),  Angle(bi, . . .  ,b„,  Aj\nm),  Angle(ci, . . .  ,c„,  Aj\nm)- 

(d)  Output  {(ai,  7i(a)),  (bi,  7i(b)),  (ci,  7i(c))}  to  player  i,  or  if  BreakDown  is  true,  output  adversary’s  contribu¬ 
tion  Zii  to  player  i. 

Squaring  Tuples:  On  input  DataType  =  (Squares,  ria), 

1.  Choose  N  =  ris  honest  shares  I  =  {ai  £  (Fp)"®  |  i  ^  A}. 

2.  It  accepts  corrupted  shares  {(ai,Si)  £  (Fp)^  ""  |  i  £  A}  and  MAC  offsets  {(Z\^“\  £  (Fp)^  ”®}  from  the 

adversary.  It  does  the  following: 

(a)  Set  s  ^  (ai  H - +  a„)  •  (ai  H - h  a„). 

(b)  Compute  a  a  set  of  honest  shares  {si  |  i  ^  A}  subject  to  s  = 

(c)  Run  the  macro  Angle(ai, . . . ,  a„,  ris)  and  Angle(si, . . . ,  s„,  A)y\ns). 

(d)  Output  {(ai,7i(a)),  (si,7i(s))}  to  player  i,  or  if  BreakDown  is  true,  output  adversary’s  contribution  Ai  to 
player  i. 

Shared  Bits:  On  input  DataType  =  (Bits,  rit), 

1.  It  gets  shares  {bi  £  (Fp)”'’  |  i  £  A}  and  MAC  offsets  {A)^^  £  (Fp)”*’}  from  the  adversary. 

(a)  Uniformly  sample  ni,  honest  shares  I  —  {bi  £  (Fp)”*’  |  i  ^  A}  subject  to  the  condition  hi  £  {0,  !}”<’. 

(b)  Run  the  macro  Angle(bi, . . . ,  b„,  A!y^ ,  n^). 

(c)  Output  (bi,  7i(b))  to  player  i,  or  if  BreakDown  is  true,  output  adversary’s  contribution  Ai  to  player  i. 

Fig.  17.  Operations  to  Generate  Auxiliar  Data  for  the  Online  Phase 


316 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


D.3  Proof  of  Lemma  1 


Proof. 

We  here  inspect  the  correctness  and  the  soundness  error  of  the  MACCheck  protocol.  In  order  to  understand  the 
probability  of  an  adversary  being  able  to  cheat,  we  design  the  following  security  game. 

1.  The  challenger  generates  the  secret  key  a  ^  ai  +  •  •  •  +  «„  and  MACs  7(0^)^  <—  a  •  Oj  and  sends  messages 

oi, . . . ,  Ot  to  the  adversary. 

2.  The  adversary  sends  back  messages  a[, . . .  ,af 

3.  The  challenger  generates  random  values  ri, . . . ,  r*  ^  Fp  and  sends  them  to  the  adversary. 

4.  The  adversary  provides  an  error  A. 

5.  Seta  ^  ^  ^ndCTi  <—  7— cti- a.  Now,  the  challenger  checks  that  (Ti  +  -  •  •+(t„  =  A 

The  adversary  wins  the  game  if  there  is  an  i  for  which  a'  f  a^  and  the  final  check  goes  through. 

The  second  step  in  the  game  where  the  adversary  sends  the  a'’s  models  the  fact  that  corrupted  players  can  choose 
to  lie  about  their  shares  of  values  opened  during  the  protocol  execution.  A  models  the  fact  that  the  adversary  is  allowed 
to  introduce  errors  on  the  macs. 

Now,  let  us  look  at  the  probability  of  winning  the  game  if  the  rfs  are  randomly  chosen.  If  the  check  goes  through, 
we  have  that  the  following  equalities  hold; 


n  n 


A  = =  -  ai  ■  a) 


n  (  t 


2=1  yj=i 

n  (  t 


i=i  \j=i 
i  /  n 

=  H  r  j  H 

j=i  \  i=i 

-hric 

i=i 

i=i 


So,  the  following  equality  holds: 

t 

i=o 

First  we  consider  the  case  where  —  Oj)  0,  so  a  =  Z\/  ~  ®i)-  "^his  implies  that  being  able 

to  pass  the  check  is  equivalent  to  guessing  a.  However,  since  the  adversary  has  no  information  about  a,  this  happens 
with  probability  only  l/|Fp|.  So  what  is  left  is  to  argue  that  ~  ®i)  =  ^  happens  with  very  low 

probability.  This  can  be  seen  as  follows.  We  define  /ij  :=  (a'  —  af)  and  p.  :=  (pi, . . . ,  pt),  r  :=  (ri, . . . ,  r*).  Now 
ffi{r)  :=  r  •  p  =  YAj=o  '^jPj  defines  a  linear  mapping,  which  is  not  the  0-mapping  since  at  least  one  p^  7^  0.  From 
linear  algebra  we  then  have  the  rank-nullity  theorem  telling  us  that  dim(ker(/^))  =  t  —  1.  Also  since  r  is  random 
and  the  adversary  does  not  know  r  when  choosing  the  a'’s,  the  probability  of  r  G  ker(/^)  is  |Fp“^|/|Fp|  =  l/|Fp|. 
Summing  up,  the  total  probability  of  winning  the  game  is  at  most  2/|Fp|. 

For  correctness  we  use  the  fact  that  Equation  1  holds  with  probability  one  if  a'  =  aj  and  Z\  =  0  (honest  prover). 

□ 
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D.4  Proof  of  Theorem  4 


Proof.  We  construct  a  simulator  5prep  (given  in  Figure  18  and  Figure  19)  such  that  no  polynomial-time  environment 
can  distinguish,  with  significant  probability,  a  view  obtained  running  TTprep  from  a  view  obtained  running  iSprepOJFprep. 
The  environment’s  view  is  the  collection  of  all  intermediate  messages  that  corrupted  players  send  and  receive,  plus  the 
inputs  and  outputs  of  all  players. 

In  a  nutshell,  the  simulator  will  run  a  copy  of  ilpREP  with  the  adversary,  acting  on  behalf  of  honest  players.  Keys 
for  the  underlying  cryptosystem  and  MACs  are  generated  by  simulating  queries  KeyGen  and  EncCommit  to  JFshe  re¬ 
spectively.  Note  that  due  to  the  distributed  decryption,  data  for  the  (online)  input  preparation  stage  might  be  incorrectly 
secret  shared,  and  all  type  of  data  might  be  incorrectly  MAC’d.  Since  the  simulator  knows  a  and  s,  it  can  compute 
offsets  on  the  secret  sharing  and  MACs  and  pass  them  to 

Before  we  discuss  indistinguishability  we  explain  how  the  cheat  mechanism  is  handled  in  the  simulation.  In  the  ex¬ 
ecution  of  TTpREP,  the  environment  may  send  Cheat  either  in  the  initial  query  KeyGen  or  in  any  later  query  EncCommit 
to  Thus,  the  success  probability  depends  on  the  number  of  cheat  attempts.  The  simulator  ensures  two  things: 

1)  Whenever  the  environment  sends  the^rsf  Cheat  to  what  it  thinks  is  JFshe,  the  call  is  forwarded  to  J^prep,  which 
decides  whether  or  not  it  is  successful.  2)  Assuming  this  cheat  was  successful,  the  simulator  recreates  the  success 
probability  that  a  real  interaction  would  have.  This  is  needed  as  otherwise  the  environment  would  distinguish.  The 
inner  procedure  SEncCommit  is  designed  for  this  purpose. 

We  now  turn  to  show  indistinguishability.  We  point  out  that  there  is  mainly  one  difference  between  a  simulated 
run  and  a  real  execution  of  ilpREp:  In  a  simulated  run,  honest  shares  used  in  the  interaction  are  randomly  sampled  by 
the  simulator.  These  shares  correspond  to  the  MAC  key,  and  shares  of  generated  data  together  with  the  shares  of  their 
MACs.  At  the  end  of  the  day,  T^ree  will  output  data  using  its  own  honest  shares  of  a,  and  its  own  honest  shares  of 
data  and  MACs. 

We  can  split  the  view  of  the  environment  in  four  chunks.  Namely,  messages  interchanged  either  in  Data  Gen,  in 
DataCheck,  or  in  MACCheck,  and  players’  output  of  J^prep.  Clearly,  indistinguishability  of  simulated  and  real  views 
of  DataCen  chunk  comes  from  the  semantic  property  of  the  underlying  cryptosystem.  For  the  DataCheck  chunk, 
note  that  all  opened  values  are  a  combination  of  output  data  and  sacrificed  data.  The  latter  does  not  form  part  of  the 
final  output,  and  therefore  by  no  means  the  environment  can  reconstruct  the  set  of  opened  values  using  its  view,  as  it 
does  not  know  honest  shares  of  the  sacrificed  data.  In  other  words,  openings  are  randomized  via  sacrificings  from  the 
environment’s  point  of  view,  so  the  best  it  can  do  is  to  guess  sacrificed  honest  shares,  which  happens  with  probability 
l/|Fp|  for  each  share’s  guessing.  For  the  MACCheck  chunk,  we  refer  to  the  fact  that  the  soundness  error  of  MACCheck 
is  2/p,  as  shown  in  Lemma  1.  Both  probabilities  are  negligible  ifp  is  exponential  in  the  security  parameter.  Lastly,  we 
also  have  consistency  between  the  output  of  JApj^gp  and  what  the  environment  sees  in  corrupted  transcripts.  This  is  due 
to  the  fact  that  the  offsets  (those  quantities  denoted  by  A)  are  simply  the  difference  between  deviated  and  correctly 
computed  data,  and  therefore  independent  of  what  data  refers  to. 

If  the  protocol  aborts  in  DataCheck  or  MACCheck,  the  players  output  0,  and  so  does  iFpREP  on  instruction  of  the 
simulator.  This  corresponds  to  the  fact  that  those  protocols  do  not  reveal  the  identity  of  any  corrupted  patty. 

It  remains  to  show  what  happens  in  case  Cheat  or  Abort  is  sent  by  the  environment.  If  the  cheat  did  not  go  through, 
players’  output  is  a  single  message  S  for  a  set  S  of  corrupted  players  in  both  real  and  simulated  interaction.  On  the 
other  hand,  if  the  cheat  did  go  through,  the  functionality  JFprep  breaks  down,  and  the  simulator  can  decide  what  MAC 
key  is  used  and  what  data  is  outputted  to  every  player,  so  it  just  gives  to  iFpREP  what  it  has  been  generated  during  the 
interaction.  If  the  environment  sends  Abort  and  a  set  S  of  corrupted  players,  this  is  simply  passed  to  Tprer,  which 
forwards  it  to  the  players. 
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The  simulator  <Sprep 

Initialize: 

-  The  simulator  first  sends  (Start,  p)  to  J^prep  and  then  interacts  with  the  adversary  acting  as  JFshe  on  query  KeyGen 
to  generate  the  encryption  public  key  (p6,  enc)  and  a  complete  set  of  shares  {si, . . .  ,Sn}  of  the  secret  key.  If  the 
adversary  sends  Cheat  to  JTshe,  the  simulator  forwards  it  to  JTprep.  If  the  cheat  passed  through,  the  simulator  sets  the 
flag  BreakDown  to  true,  otherwise  it  is  set  to  false. 

-  The  generation  of  the  MAC  key  a  is  done  as  in  the  protocol,  but  calling  to  SEncCommit(Fp)  instead  to  JFshe  on 
query  EncCommit.  The  simulator  stores  a  ^  ai  +  •  •  •  +  a„  for  later  use. 

-  Lastly,  it  gives  Oi  to  JFpRgp  for  i  £  A  if  BreakDown  is  false,  and  i  <  n  otherwise. 

-  If  the  simulation  JTshe  aborts  on  KeyGen  or  and  EncCommit,  go  to  “Aborf’. 

Command  =  DataGen:  On  input  (ni,nm,  ns,nh)  from  honest  players  and  adversary,  the  simulator  sets 

Tinput  ^  SimDataGen(lnputPrep, nj) 

Trripies  ^  SimDataGen(Triples,  rtm) 

Tsquares  ^  Si  m  Data  Gen  (Sq  uares,  fts ) 

Tbus  ^  SimDataGen(Bits, n,;,), 

where  SimDataGen  is  specified  in  Figure  19.  These  calls  also  return  a  decision  bit.  If  it  is  set  to  Abort,  the  simulator  goes 
to  “Abort”. 

Command  =  DataCheck: 

-  Step  1  is  executed  as  in  the  protocol  but  calling  to  SEncCommit(i?p).  The  simulator  goes  to  “Aborf’  if  SEncCommit 
says  so. 

-  The  simulator  performs  steps  (a)-(d)  of  subprocedures  Triples,  Squares,  Bits  of  DataCheck.  In  each  iteration  k,  it 
gets  to  know  the  value  rr*,.  If  any  of  these  values  are  non-zero,  the  simulator  sends  Abort  and  0  to  J-prep-  Otherwise, 
the  algebraic  relation  among  generated  data  is  correct  with  probability  1  —  1  /p. 

Finalize:  At  this  point,  the  functionality  is  waiting  for  instruction  Proceed  or  Abort,  or  otherwise,  a  complete  break  down 
occurred,  and  the  functionality  is  waiting  for  command  DataGen  and  output  values  from  the  adversary. 

1.  The  simulator  engages  with  the  adversary  in  a  normal  run  of  MACCheck  on  behalf  of  each  honest  player  i.  Note  that 

to  generate  honest  ct;  the  simulator  uses  shares  a;.  If  ai  -(-  •  •  •  +  7^  0,  send  Abort  and  0  to  J^prep- 

2.  Otherwise  send  Success  to  the  adversary,  and  send  to  JApppp  the  following: 

If  BreakDown  is  false,  send  'I'lnput^  '^Tripies^  '^'squares^ 

-  If  BreakDown  is  true,  send  all  the  data  (corresponding  to  honest  and  corrupted  players)  generated  in  the  execu¬ 
tion  of  SimDataGen. 

Abort:  If  the  simulated  JAshe  aborts  outputting  a  set  S  of  corrupted  players,  input  Abort  and  S  to  J-prep- 

Fig.  18.  The  Simulator  <Sprep  For  The  Preprocessing  Phase 
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The  simulator  iSprep 

SimDataGen(DataType):  This  procedure  gets  ready  the  data  to  be  inputted  to  JTprep. 

DataType  =  InputPrep  : 

-  The  simulator  engages  in  a  normal  run  of  steps  (a)-(g)  calling  to  SReshare  instead  of  Reshare.  If,  at  any  point, 
some  of  the  calls  returned  Abort,  the  simulator  sets  Decision  «—  Abort  and  Tinput  ^  0- 

-  Otherwise  all  the  rounds  were  successful.  The  simulator  sets  Decision  ^  Continue.  Note  that  in  step  (c)  (after 

unpacking  all  the  rounds),  the  simulator  gets  players’  shares  and  MAC  shares  \i,k  < 

n}.  Then  is  the  (presumably)  input  of  player  i.  The  simulator  has  the  secret  key,  so  it  can  get 

the  real  input  from  the  broadcast  ciphertexts  (if  Pi  is  corrupted)  or  from  what  he  generated  (if  Pi  is  honest). 

It  computes  offsets  Ai''^  <—  f and  A^'^  ^  Yl'k 

There  are  two  possibilities: 

•  Flag  BreakDown  is  set  to  false.  This  means  no  cheat  has  occurred,  so  the  simulator  prepares  corrupt  inputs, 

corrupt  shares  and  MAC  shares,  and  offsets.  That  is,  it  sets  Tinput  <—  Ai^\  \  k  £ 

A,  i  <  n} 

•  Flag  BreakDown  is  set  to  true.  Then  there  was  at  least  one  successful  cheat,  and  the  functionality  is  waiting 
for  adversary’s  contributions.  The  simulator  sets  Tinput  to  be  the  output  of  each  player. 

DataType  =  Triples,  Squares,  Bits:  The  simulator  engages  in  a  normal  run  of  the  subprocedure  specified  by  DataType, 
but  calling  to  SEncCommit(i?p)  and  SReshare(cm)  instead  of  JAenccommit  and  Reshare(cm).  If  any  of  the  above 
macros  returned  Abort  the  simulator  sets  Decision  ^  Abort  and  ToataType  <—  0-  In  any  other  case  the  simulator 
sets  Decision  <—  Continue,  handles  the  BreakDown  flag  as  above,  and  does: 

Triples:  Sst  Triples  ^  {(a;,  bi,  Ci,  7(a)i,  7(b)i,  7(0)1,  |  i  <  n}.  The  shares 

are  unpacked  in  step  (i):  corrupt  shares  are  given  by  the  adversary,  and  honest  shares  are  sampled  uniformly. 
MAC  shares  are  produced  after  executing  SReshare  to  simulate  step  (h),  and  the  offsets  are  computed  as  ex¬ 
plained  earlier. 

Squares:  SetTsquares  ^  {{aLi,hi,'y{a)i,'y{h)i,  Ai^\  A^'')  £  (Fp)®'*®'"=+”'>^  \  i  <  n}  Shares,  MAC  shares 
and  offsets  are  obtained  as  explained  above. 

Bits:  Set  Tsits  <—  {(b;,  7;,  £  (Fp)®'^®'"'>^  \i  <  n} .  A  number  nj,  >  ni  of  binary  shares  and  MACs  has 

been  computed.  The  exact  amount  n'y  is  round-dependent  and  it  is  expected  to  be  approximately  {rib  +  m  12)  ■ 
{P  -  1)/P- 

Return  (Decision,  ToatoTj/pe). 

Macro  SEncCommit(cond)  This  macro  is  intended  to  simulate  a  call  to  JAshe  on  query  EncCommit. 

-  The  simulator  receives  corrupted  seeds  Si  from  the  adversary,  when  it  thinks  is  interacting  with  JAshe,  and  computes 
rtii  and  Cm;  for  i  £  A  which  are  given  to  the  adversary.  Then  the  simulator  generates  uniformly  nii  and  d  — 
Encpe  (nii)  for  i  ^  A,  and  gives  d  to  the  adversary.  It  waits  for  response  Proceed,  Cheat  or  Abort. 

-  If  the  adversary  gives  Proceed,  the  simulator  sets  Decision  ^  Continue,  and  if  the  adversary  gives  Abort,  set 
Decision  ^  Abort  and  also  send  Abort  to  JAprep. 

-  If  the  adversary  gives  (Cheat,  {m*,  c*}igA),  set  m.i  ^  m*,  d  ^  c*  for  i  G  A,  and  do  the  following: 

1.  Check  if  flag  BreakDown  is  false,  if  so,  send  Cheat  to  JAprep.  Then  set  BreakDown  to  true.  There  are  two 
possibilities: 

(a)  The  functionality  returns  Success:  set  Decision  ^  Continue. 

(b)  The  functionality  returns  NoSuccess:  set  Decision  ^  Abort. 

2.  If  BreakDown  is  set  to  true,  with  probability  1/c  set  Decision  +—  Continue,  or  otherwise  Decision  ^  Abort. 

-  Return  (Decision,  mi, . . . ,  m„,  ci, . . . ,  Cn). 

Macro  SReshare(cm) 

-  Set  (fi, . . . ,  f„.  Cl, . . . ,  Cn)  <—  SEncCommit(7?p)  and  f  ^  fi.  Set  Decision  <—  Abort  if  SEncCommit  says  so. 

-  Otherwise,  set  Decision  ^  Continue  and  run  steps  2-5  of  Reshare.  Note  that  in  step  3  the  simulator  might  get  an 

invalid  value  (m  -f  f)*.  Set  mi  ^  (m  -f  f)*  —  fi  and  mi  « - fi. 

-  Return  shares  (Decision,  mi, . . . ,  m„). 

Fig.  19.  Internal  Procedures  Of  The  Simulator  (Sprep 
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E  Online  Phase  :  Protocol,  Functionalities  and  Simulators 
E.l  Protocols 


Protocol  TTonline 

Initialize:  The  parties  call  JFprep  to  get  the  shares  ai  of  the  MAC  key,  a  number  of  multiplication  triples  ((a)  ,  (&)  ,  (c)), 
squares  ((a)  ,  (fe)),  bits  (6),  and  mask  values  (ri,  {ri})  as  needed  for  the  circuit  being  evaluated.  If  aborts  outputting 
a  set  S  of  corrupted  players,  the  players  output  S  and  abort.  Then  the  operations  specified  below  are  performed  according 
to  the  circuit. 

Input:  To  share  his  input  Xi,  Player  i  takes  an  available  mask  value  (r;,  (r^))  and  does  the  following: 

1.  Broadcast  e  <—  Xi  —  ri. 

2.  The  players  compute  (xi)  ^  (xi)  +  e. 

Add:  On  input  {{x}  ,{y}),  the  players  locally  compute  {x  +  y)  ^  (x)  +  (y). 

Multiply:  On  input  {{x}  ,{y)),  the  players  do  the  following: 

1.  Take  one  multiplication  triple  ((a) ,  (b)  ,  (c))  and  open  (x)  —  (a)  ,  (y)  —  (6)  to  get  e,  p  respectively. 

2.  Locally  each  player  sets  (z)  ^  (c)  +  e  ■  (b)  +  p  ■  (a)  +  e  ■  p 
Square:  On  input  (x)  the  players  do  the  following: 

1.  Take  a  square  pair  ((a)  ,  (b))  and  partially  open  (x)  —  (a)  so  all  players  get  e. 

2.  All  players  locally  compute  {z)  ^  {b)  +  2  ■  e  ■  (x)  —  . 

Output:  This  procedure  is  entered  once  the  players  have  finished  the  circuit  evaluation,  but  still  the  final  output  {y)  has  not 
been  opened. 

1.  The  players  call  the  MACCheck  protocol  on  input  all  opened  values  so  far.  If  it  fails,  they  output  0  and  abort. 
0  represents  the  fact  to  the  corrupted  players  remain  undetected  in  this  case. 

2.  The  players  open  {y)  and  call  MACCheck  on  input  y  to  verify  its  MAC.  If  the  check  fails,  they  output  0  and  abort, 
otherwise  they  accept  y  as  a  valid  output. 

Fig.  20.  Operations  for  Secure  Function  Evaluation 
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E.2  Functionalities 


Functionality  JFqnline 

Initialize:  On  input  {init,p,k)  from  all  parties,  the  functionality  stores  {domain, p,  k)  and  waits  for  an  input  from  the 
environment.  Depending  on  this,  the  functionality  does  the  following: 

Proceed  It  sets  BreakDown  to  false  and  continues. 

Cheat  With  probability  1/c,  it  sets  BreakDown  to  true,  outputs  Success  to  the  environment  and  continues.  Otherwise  it 
outputs  NoSuccess  and  proceeds  as  in  Abort. 

Abort  It  waits  for  the  environment  to  input  a  set  S  of  corrupted  players,  outputs  it  to  the  players,  and  aborts. 

Input:  On  input  {input,  Pi,  varid,  x)  from  Pi  and  {input.  Pi,  varid,  ?)  from  all  other  parties,  with  varid  a  fresh  identifier, 
the  functionality  stores  {varid,  x).  If  BreakDown  is  true,  it  also  outputs  x  to  the  environment. 

Add:  On  command  {add,  varidi,  varid2,  varids)  from  all  parties  (if  varidi,  varid2  are  present  in  memory  and  varids  is 
not),  the  functionality  retrieves  {varidi,  x),  {varid2,y)  and  stores  {varidi,  x  +  y). 

Multiply:  On  input  {multiply,  varidi,  varid2,  varidi)  from  all  parties  (if  varidi,  varid2  are  present  in  memory  and  varids 
is  not),  the  functionality  retrieves  {varidi,  x),  {varid2,y)  and  stores  {varidi,  x  ■  y). 

Square:  On  input  {square,  varidi,  varid2)  from  all  parties  (if  varidi  is  present  in  memory  and  varid2  is  not),  the  function¬ 
ality  retrieves  {varidi,  x),  and  stores  {varid2,  x^). 

Output:  On  input  {output,  varid)  from  all  honest  parties  (if  varid  is  present  in  memory),  the  functionality  retrieves 
{varid,  y)  and  outputs  it  to  the  environment. 

-  If  BreakDown  is  false,  the  functionality  waits  for  an  input  from  the  environment.  If  this  input  is  Deliver  then  y  is 
output  to  all  players.  Otherwise  0  is  output  to  all  players. 

-  If  BreakDown  is  true,  the  functionality  waits  for  y*  from  the  environment  and  outputs  it  to  all  players. 

Fig.  21.  The  ideal  functionality  for  MFC 
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E.3  Proof  of  Theorem  5 


Proof. 

We  construct  a  simulator  5online  to  work  on  top  of  the  ideal  functionality  J^online,  such  that  the  adversary  cannot 
distinguish  whether  it  is  playing  with  the  protocol  iToNUNE  and  J^prep,  or  the  simulator  and  J^online-  See  Appendix  E 
for  the  complete  description  of  the  simulator. 

We  now  proceed  with  the  analysis  of  the  simulation,  by  first  arguing  that  all  the  steps  before  the  output  are  perfectly 
simulated  and  finally  we  show  that  the  simulated  output  is  statistically  close  to  the  one  in  the  protocol. 

During  initialization,  the  simulator  merely  acts  as  JFprep  with  the  difference  that  the  decision  about  the  success  of 
a  cheating  attempt  is  made  by  J^online-  If  the  cheating  was  successful,  J^online  will  output  all  honest  inputs,  and  the 
simulator  can  determine  all  outputs.  Therefore,  the  simulation  will  precisely  agree  with  the  protocol.  For  the  rest  of 
the  proof,  we  will  assume  that  there  was  no  cheating  attempt. 

In  the  input  stage  the  values  broadcast  by  the  honest  players  are  uniform  in  the  protocol  as  well  as  in  the  simulation. 
Addition  does  not  involve  communication,  while  multiplication  and  squaring  involve  partial  openings:  in  the  protocol 
a  partial  opening  reveals  uniform  values,  and  the  same  happens  also  in  a  simulated  run.  Moreover,  MACs  carry  the 
same  distribution  in  both  the  protocol  and  the  simulation. 

In  the  output  stage  of  both  the  real  and  simulated  run  if  the  output  y  is  delivered,  the  environment  sees  y  and  the 
honest  players’  shares,  which  are  uniform  and  compatible  with  y  and  its  MAC.  Moreover,  in  a  simulated  run  the  output 
y  is  a  correct  evaluation  of  the  function  on  the  inputs  provided  by  the  players  in  the  input  phase.  In  order  to  conclude, 
we  need  to  make  sure  that  the  same  applies  to  the  real  protocol  with  overwhelming  probability.  As  shown  in  Lemma 
1,  the  adversary  was  able  to  cheat  in  one  MACCheck  call  with  probability  2/p.  Thus,  the  overall  cheating  probability 
is  negligible  since  p  is  assumed  to  be  exponential  in  the  security  parameter.  This  concludes  the  proof. 

□ 

Simulator  5online 

Initialize:  The  simulation  of  the  initialization  procedure  is  performed  running  a  local  copy  of  JErrep.  Notice  that  all  the  data 
given  to  the  adversary  is  know  by  the  simulator. 

If  the  environment  inputs  Proceed,  Cheat,  or  Abort  to  the  copy  of  JErrep,  the  simulator  does  so  to  JEonline  and  forwards 
the  output  of  JEonline  to  the  environment.  If  the  output  is  Success,  the  simulator  sets  BreakDown  to  true  and  uses  the 
environment’s  inputs  as  preprocessed  data.  If  JAdneine  outputs  NoSuccess  of  the  input  was  Abort,  the  simulator  waits  for 
input  S  from  the  environment,  forwards  it  to  JTonline,  and  aborts. 

Input: 

-  If  BreakDown  is  false,  honest  input  is  performed  according  to  the  protocol,  with  a  dummy  input,  for  example  zero. 

-  If  BreakDown  is  true,  JEonline  outputs  the  inputs  of  honest  players,  which  then  can  be  used  in  the  simulation. 

For  inputs  given  by  a  corrupt  player  Pi,  the  simulator  waits  for  Pi  to  broadcast  the  (possibly  incorrect)  value  e',  computes 
x'i  ^  ri  +  t  and  uses  a:'  as  input  to  JEonline- 

Add/Multiply/Square:  These  procedures  are  performed  according  to  the  protocol.  The  simulator  also  calls  the  respective 
procedure  to  JAdneine- 

Output:  JAsneine  outputs  y  to  the  simulator. 

-  If  BreakDown  is  false,  the  simulator  now  has  to  provide  the  honest  players’  shares  of  such  a  value;  it  already 
computed  an  output  value  y' ,  using  the  dummy  inputs  for  the  honest  players,  so  it  can  select  a  random  honest  player 
and  modify  its  share  adding  y  —  y'  and  modify  the  MAC  adding  a{y  —  y'),  which  is  possible  for  the  simulator,  since 
it  knows  a.  After  that,  the  simulator  is  ready  to  open  y  according  to  the  protocol.  If  y  passes  the  check,  the  simulator 
sends  Deliver  to  JAjneine- 

-  If  BreakDown  is  true,  the  simulator  inputs  the  result  of  the  simulation  to  JAoneine. 

Fig.  22.  Simulator  for  the  Online  phase 
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F  Active  Security 

The  following  is  a  sketch  of  a  method  for  an  actively  secure  version  of  -/7encCommit-  More  specifically,  we  assume 
players  have  access  to  an  ideal  functionality  ^keyGen  which  generates  the  key  material  as  ^^keyGen,  but  it  models  active 
security  rather  than  covert  security.  More  concretely,  this  just  means  that  there  is  no  “cheat  option”  that  the  adversary 
can  choose.  The  purpose  of  this  section  is  therefore  to  describe  a  protocol  ^j^ncCommit  which  securely  implements 
an  ideal  functionality  in  the  -hybrid  model,  where  behaves  as  -Fshe,  but,  again,  models  active 

security. 

The  protocol  is  inspired  by  the  protocol  from  [22]  where  a  particularly  efficient  variant  of  the  cut-and-choose 
approach  was  developed. 

Let  Pi  be  the  player  who  is  to  produce  ciphertexts  to  be  verified  by  the  other  players.  The  protocol  is  parametrized 
by  two  natural  numbers  T,  b  where  b  divides  T.  We  will  sett  =  T /b.  The  protocol  will  produce  as  output  t  ciphertexts 
Co,  •  ■  •  ,  Ct_i. 

Each  such  ciphertext  is  generated  according  to  the  algorithm  described  earlier,  and  is  therefore  created  from  the 
public  key  and  four  polynomials  m,  v,  cq  and  ei.  To  make  the  notation  easier  to  deal  with  below,  we  rename  these  as 
/i,  /2,  /s,  /t-  We  can  then  observe  that  there  exist  pi,  for  I  =  1, ...  ,4  such  that  ||/;|lgo  <  pi  except  with  negligible 
probability.  Concretely,  we  can  use  pi  =  p/2,  p2  =  1  and  Ps  =  Pa  =  p  where  p  can  be  determined  by  a  tail-bound  on 
the  gaussian  distribution  used  for  generating  fs,  /a- 

The  player  Pi  will  also  create  a  set  of  random  reference  ciphertexts  ho,  •  ■  •  ,h2T-i  that  are  used  to  verify  that 
Co, ... ,  Ct-i  are  well-formed  and  that  Pi  knows  what  they  contain.  Each  hj  is  created  from 4  polynomials  gi, ...  ,gA 
in  the  same  way  as  above,  but  the  polynomials  are  created  with  a  different  distribution.  Namely,  they  are  random 
subject  to  llpillgo  <  4  ■  S  ■  Pi  ■  T  ■  (j){m),  where  h  >  1  is  some  constant. 

The  protocol  now  proceeds  as  follows: 

1.  Below  Pi  is  given  some  number  of  attempts  to  prove  that  his  ciphertexts  are  correctly  formed.  The  protocol  is 
parametrized  by  a  number  M  which  is  the  maximal  number  of  allowed  attempts.  We  start  by  setting  a  counter 
w  =  1. 

2.  Pi  broadcasts  the  ciphertexts  Cq,  . . . ,  Ct-i  and  the  reference  ciphertexts  ho,  ■  •  • ,  h2T-i  containing  plaintexts.  These 
ciphertexts  should  be  generated  from  seeds  sq,  ... ,  S2T-1  that  are  first  sent  through  the  random  oracle  and  the 
output  is  used  to  generate  the  plaintext  and  randomness  for  the  encryptions. 

3.  A  random  index  subset  of  size  T  is  chosen,  and  Pi  must  broadcast  Si  for  i  G  T.  Players  check  that  each  opened  Si 
indeed  induces  the  ciphertext  h^,  and  abort  if  this  is  not  the  case. 

4.  A  random  permutation  tt  on  T  items  is  generated  and  the  unopened  ciphertexts  are  permuted  according  to  tt.  We 
renumber  the  permuted  ciphertexts  and  call  them  ho,  • .  • ,  hr-i- 

5.  Now,  for  each  c^,  the  subset  of  ciphertexts  {h{,i+jj  j  =  0, . . . ,  5  —  1}  is  used  to  demonstrate  that  Ci  is  correctly 
formed.  This  is  called  the  block  of  ciphertexts  assigned  to  c^.  We  do  as  follows: 

(a)  Eor  each  i,j  do  the  following:  let  /i, . . . ,  /4  and  pi, . . . ,  (/4  be  the  polynomials  used  to  form  Ci,  respectively 

Dbi+j- Define  zi  =  f  +  gi,  for  Z  =  1, . . . ,  4. 

(b)  Player  Pi  checks  that  \\zi\\^  <  4  ■  6  ■  pi  ■  T  ■  (j){m)  —  pi.  If  this  is  the  case,  he  broadcasts  zi,  for  I  =  1, ...  ,4. 
Otherwise  he  broadcasts  _L. 

(c)  In  the  former  case  players  check  that  ||z/||g^  is  in  range  for  f  =  1, . . . ,  4  and  that  the  zfs  induce  the  ciphertext 

Ci  ~t“  ftbiA-j . 

(d)  At  the  end,  players  verify  that  for  each  Ci,  Pi  has  correctly  opened  Ci  +  du+j  for  all  ciphertexts  in  the  block 
assigned  to  c^. 

(e)  If  all  checks  go  through,  output  Cq,  . . . ,  C(_i  and  exit.  Else,  if  u  <  M,  increment  v  and  go  to  step  2.  Einally,  if 

V  =  M,  the  prover  has  failed  to  convince  us  M  times,  so  abort  the  protocol. 

It  is  possible  to  adapt  the  protocol  for  proving  that  the  plaintexts  in  Ci  satisfy  certain  special  properties.  Eor  instance, 
assume  we  want  to  ensure  that  the  plaintext  polynomial  fi  is  a  constant  polynomial,  i.e.,  only  the  degree-0  coefficient 
is  non-zero.  We  do  this  by  generating  the  reference  ciphertexts  such  that  for  each  di,  the  polynomial  gi  is  also  a 
constant  polynomial.  When  opening  we  check  that  the  plaintext  polynomial  is  always  constant.  The  proof  of  security 
is  trivially  adapted  to  this  case. 
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Some  intuition  for  why  this  works:  after  half  the  reference  ciphertexts  are  opened,  we  know  that  except  with 
exponentially  small  probability  almost  all  the  unopened  ciphertexts  are  well  formed.  A  simulator  will  be  able  to 
extract  randomness  and  plaintext  for  all  the  well  formed  ones.  When  we  split  the  unopened  d/s  randomly  in  blocks 
of  b  ciphertexts,  it  is  therefore  very  unlikely  that  some  block  contains  only  bad  ciphertexts.  It  can  be  shown  that  the 
probability  that  this  happens  is  at  most  ■  (e  ■  ln(2))“^  [22]. 

Assume  Fi  is  corrupt:  Now,  if  he  survives  one  iteration  of  the  test,  and  no  block  was  completely  bad,  it  follows 
that  for  every  c^,  he  has  opened  opened  at  least  one  Ci  +  where  i)i,i+j  was  well  formed.  The  simulator  can 

therefore  extract  a  way  to  open  Ci  since  Ci  =  (ci  +  ^bi+j)  —  ^bi+j-  It  will  be  able  to  compute  polynomials  fi  for  Ci  with 
1 1/;  I  loo  —  8  •  5  •  Pi  ■  T  ■  (j>{rn).  Therefore,  if  some  Ci  is  not  of  this  form,  the  prover  can  survive  one  iteration  of  the  test 
with  probability  at  most  ■  (e  ■  ln(2))“^.  To  survive  the  entire  protocol,  the  prover  needs  to  win  in  at  least  one  of 
the  M  iterations,  and  this  happens  with  probability  at  most  M  ■  ■  (e  ■  ln(2))“^,  by  the  union  bound. 

Assume  Fi  is  honest:  Then  when  he  decides  whether  to  open  a  given  ciphertext,  the  probability  that  a  single 
coefficient  is  in  range  is  There  are  4  •  (j){m)  coefficients  in  a  single  ciphertext  and  up  to  T  ciphertexts  to 

open,  so  by  a  union  bound,  Fi  will  not  need  to  send  _L  at  all,  except  with  probability  \/5.  The  probability  that  an 
honest  prover  fails  to  complete  the  protocol  is  hence  {1/5)^ .  We  therefore  see  that  the  completeness  error  vanishes 
exponentially  with  increasing  M,  and  in  the  soundness  probability,  we  only  loose  log  M  bits  of  security. 

It  is  easy  to  see  that  for  each  opening  done  by  an  honest  prover,  the  polynomials  zi  will  have  coefficients  that  are 
uniformly  distributed  in  the  expected  range,  so  the  protocol  can  be  simulated. 

Finally,  note  that  in  a  normal  run  of  the  protocol,  only  1  iteration  is  required,  except  with  probability  1  /5.  So  in 
practice,  what  counts  for  the  efficiency  is  the  time  we  spend  on  one  iteration. 

In  our  experiments  we  implemented  the  above  protocol  with  the  following  parameter  choices  5  =  256,  M  =  5, 
f  =  12  and  b  =  16.  This  guaranteed  a  cheating  probability  of  2“^°,  as  well  as  the  probability  of  an  honest  prover 
failing  of  2“"^°.  In  addition  the  choice  of  f  =  12  was  to  ensure  that  each  run  of  the  protocol  created  enough  ciphertexts 
to  be  run  in  two  executions  of  the  main  loop  of  the  multiplication  triple  production  protocol.  By  increasing  t  and 
decreasing  b  one  can  improve  the  amortized  complexity  of  the  protocol  while  keeping  the  error  probabilities  the  same. 
This  comes  at  the  cost  of  increased  memory  usage,  primarily  because  decreasing  b  to,  e.g,  6/2  means  that  t  needs  to 
be  replaced  by  essentially  On  our  test  machines  t  =  12  seemed  to  provide  the  best  compromise. 


G  Parameters  of  the  BGV  Scheme 

In  this  appendix  we  present  an  analysis  of  the  parameters  needed  by  the  BGV  to  ensure  that  the  distributed  decryption 
procedure  can  decrypt  the  ciphertexts  produced  in  the  offline  phase  and  that  the  scheme  is  “secure”.  Unlike  in  [14], 
which  presents  the  analysis  in  terms  of  a  worst  case  analysis,  we  use  the  expected  case  analysis  used  in  [16]. 


G.l  Expected  Values  of  Norms 

Given  an  element  a  G  R  (represented  as  a  polynomial)  we  define  ||a||p  to  be  the  standard  p-norm  of  the  coefficient 
vector  (usually  for  p  =  1,  2  or  oo).  We  also  define  ||a||p^^  to  be  the  p-norm  of  the  same  element  when  mapped  into  the 
canonical  embedding  i.e. 

I|a|ir  =  lk(a)llp 

where  K{a)  :  R  — >  is  the  canonical  embedding.  The  key  two  relationships  are  that 


<  Ct, 
I  loo  —  ^ 


and 


i«iir  < 


ii> 


for  some  constant  depending  on  m.  Since  in  our  protocol  we  select  m  to  be  a  power  of  two  then  we  have  =  1- 
We  also  define  the  canonical  embedding  norm  reduced  modulo  q  of  an  element  a  G  Ras  the  smallest  canonical 
embedding  norm  of  any  a'  which  is  congruent  to  a  modulo  q.  We  denote  it  as 

|a|“”  =  min{  ||a'||“"  :  a'  G  R,  a'  =  a  (mod  q)  }. 
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We  sometimes  also  denote  the  polynomial  where  the  minimum  is  obtained  by  and  call  it  the  canonical  reduction 

of  a  modulo  q. 

Following  [16][Appendix  A. 5]  we  examine  the  variances  of  the  different  distributions  utilized  in  our  protocol.  Let 
Cm  denote  any  complex  primitive  m-th  root  of  unity.  Sampling  a  G  R  from  Ti.WT{h^  4){m))  and  looking  at  a{(m) 
produces  a  random  variable  with  variance  h,  when  sampled  from  ZO{0.5, 4>{rn))  we  obtain  variance  (j>{m) /2,  when 
sampled  from  T>Q{a^,  we  obtain  variance  cr^  •  (j){m)  and  when  sampled  from  U{q,  (j>{m))  we  obtain  variance 

q^  ■  (j){m) /12.  By  the  law  of  large  numbers  we  can  use  6  •  'JV ,  where  V  is  the  above  variance,  as  a  high  probability 
bound  on  the  size  of  a(Cm),  and  this  provides  a  bound  on  the  canonical  embedding  norm  of  a. 

If  we  take  a  product  of  two,  three,  or  four  such  elements  with  variances  Vi,V2, ...  ,¥4  we  use  16  •  \JV\  ■  V2, 
9.6  •  \JV\  ■  V2  ■  V3  and  7.3  •  \JV\  •  V2  •  V3  •  V4  as  the  resulting  bounds  since 

erfc(4)2  «  erfc(3.1)3  «  erfc(2.7)‘‘  «  2-^°. 


G.2  Key  Generation 


We  first  need  to  establish  the  rough  distributions  (i.e.  variances)  of  the  resulting  keys  arising  from  our  key  generation 
procedure.  For  our  purposes  we  are  only  interested  in  the  variance  of  the  associated  distributions  in  the  canonical 
embedding,  in  which  case  we  obtain 


Var(K(Sj))  =  n  •  Var(K(Sij))  =  n  -  h, 
Var(«;(aj))  =  q^  ■  C'(to)/12, 
Var(/t(ej))  =  n  •  \/ar{K{eij))  =  n  ■ 


We  will  also  need  to  analyze  the  distributions  of  the  randomness  needed  to  produce  enCj .  Here  we  assume  that  all 
parties  follow  the  protocol  and  we  are  only  interested  in  the  output  final  extended  public  key,  thus  we  write  (dropping 
the  j  to  avoid  overloading  the  reader) 


enc  =  (^5.52,05,52) 


where 


^5,52  —  O5  52  '  5  p  '  ^^5.52 


-Pi  -s^- 


We  can  also  write 


enc'  =  {b-Vi+p-  eo,i  -  pi  ■  Si,a  ■  Vi  +  p  ■  ci^i) 
aerOi  =  {b-vl+p-  e'oi,  a-v'+p-  e[i) 

where  (vi,  eo.i,  ei,i)  ^  RCs(0.5,  a^,</>(m))  and  (v',  Cq  j,  e'lj)  ^  RCs(0.5,  4>{m)).  We  therefore  have 


■ 


(  ” 

\'^a-Vj+p-eij 

Vi=i 


n 

i=l 


326 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


and 


■  X  ^  J  -  Pi  •  %  +  X(^  ■  +  P  ■  ®0,i) 

i=i  Vi=i  / 

n  /  n  \  n  I  n 

=  0^,52  •  s  -  s  •  X  Si  •  X  “  ■  +  p  ■  +  X  ■  X  ^  "^1  +  p  ■  ®o-i  “  Pi  ■ 

i=i  Vi=i  /  i=i  Vi=i 

n  n 

+  '^{ia-  s  +  p-  e)  -v'.+p-eo^i)  -  s  ■'^{a  ■  v- +  p  ■  e[^i) 


05^52  •  s  +  X  X  b '  Uj  ■  Si  +  p  ■  eoj  ■  Si  —  Pi  ■  Si  ■  5j  —  s  ■  5i  ■  a  ■  Vj  —  5  ■  Si  ■  p  ■  ei  j 
i=i  Vi=i  / 

n 

+  P'X(^’^i  +  ®o.*-e'i.i-s) 

n  /  n  \ 

05,52  •  s  +  p  •  X!  I  X!  ■  "Oi  ■  Si  +  O0,j  •  Si  ~  S  •  Si  •  Cl  j)  +  €  ■  v'i  +  Cq  j  —  6^  j  •  S  1  —  pi  ■  S 


1=1  \j=l 

=  0^,52  -s  +  p-  65^52  -pi-s^ 


,52  —  X  X  ■  Si-  s-  Si  -  eij)  +  £•?;'  +  -  e'li  ■  s 


i=i  \i=i  / 

Thus  the  values  enc  are  indeed  genuine  “quasi-encryptions”  of  —pi  -s^  with  respect  to  the  secret  key  s  and  the  modulus 
qi.  Equation  2  will  be  used  later  to  establish  the  properties  of  the  output  of  the  Switch  Key  procedure. 

G.3  BGV  Procedures 

We  can  now  turn  to  each  of  the  procedures  in  turn  of  the  two  level  BGV  scheme  we  are  using  and  estimate  the  output 
noise  term.  For  a  ciphertext  c  =  (cq,  ci,  f)  we  define  the  “noise”  to  be  an  upper  bound  on  the  value 

l|co-s-ci||X. 

Encp{(m):  Given  a  fresh  ciphertext  (cq,  ci,  1),  we  calculate  a  bound  (with  high  probability)  on  the  output  noise  by 

iico-s-ciiu<iico-s-cirr 

=  ||((a  •  s  +  p-  e)  •  V  +  p-  eg  +  m  -  (a  •  u  +  p  •  ei)  •  s||X 

=  ||m  +  p-  (e-w  +  eg-ei  •s)||X 

<  IlmllX  +p-  (Ik- v|IX  +  IkollX  +  Iki  -siiX) 

<  <f>(m)  ■  p/2  +  p  •  CT  •  ^16  •  (/(to)  •  \/n/2  +  6  •  \/ (j){m)  +  16  •  ^n  -  h  -  =  i?ciean- 

Note  this  value  of  Bciean  is  different  from  that  in  [16]  due  to  the  different  distributions  resulting  from  the  distributed 
key  generation. 

SwitchModulus((cg,  ci),  f):  If  the  input  ciphertext  has  noise  v  then  the  output  ciphertext  will  have  noise  v'  where 


12  —  +  .^scale- 

Pi 
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The  value  Bsca\e  is  an  upper  bound  on  the  quantity  ||to  +  ti  •  where  K{Ti)  is  drawn  from  a  distribution  which  is 
close  to  a  complex  Gaussian  with  variance  (j){m)  ■  p^/12.  We  therefore,  we  can  (with  high  probability)  take  the  upper 
bound  to  be 

-Bscaie  =  6  •  p  •  \/(j){m)/l2  +  16  •  p  •  i/n  •  •  h/12, 

=  p  ■  •  4){m)  ■  +  8  •  V n  ■  h/3^  . 

Again,  note  the  dependence  on  n  (compared  to  [16])  as  the  secret  key  s  is  selected  from  a  distribution  with  variance 
n  ■  h,  and  not  just  h.  Also  note  the  dependence  on  p  due  to  the  plaintext  space  being  defined  mod  p  as  opposed  to  mod 
2  in  [16]. 


Decs(c):  As  explained  in  [14, 16]  this  procedure  works  when  the  noise  ly  associated  with  a  ciphertext  satisfies  ly  = 
Cm  *  ^  ^  ^r/2. 

DistDeCj. (c);  The  value  B  is  an  upper  bound  on  the  noise  ly  associated  with  a  ciphertext  we  will  decrypt  in  our 
protocols.  To  ensure  valid  distributed  decryption  we  require 

2-(l  +  2=®")-B<®. 

Given  a  value  of  B,  we  therefore  will  obtain  a  lower  bound  on  po  by  the  above  inequality.  The  addition  of  a  random 
term  with  infinity  norm  bounded  by  2^®^  ■  B/{n-  p)  in  the  distributed  decryption  procedure  ensures  that  the  individual 
coefficients  of  the  sum  ti  +  •  •  •  + 1„  are  statistically  indistinguishable  from  random,  with  probability  2“^®'^.  This  does 
not  imply  that  the  adversary  has  this  probability  of  distinguishing  the  simulated  execution  in  [14]  from  the  real  execu¬ 
tion;  since  each  run  consists  of  the  exchange  of  (j>{m)  coefficients,  and  the  protocol  is  executed  many  times  over  the 
execution  of  the  whole  protocol.  We  however  feel  that  setting  concentrating  solely  on  the  statistical  indistinguishability 
of  the  coefficients  is  valid  in  a  practical  context. 


Switch  Key  (do  7  di,  ^2):  In  order  to  estimate  the  size  of  the  output  noise  term  we  need  first  to  estimate  the  size  of  the 
term 

Using  Equation  2  we  find 


\\p-d2  ■  65^52) irVgo  <  P- 


4>{m) 

12 


v?  ■  a 


^7.3  •  \/n  -  h  ■  (j){mY/2  +  9.6  •  \/h  ■ 
+7.3  •  h  ■  s/n-  4>{  m)^ 

+n  ■  ^9.6  •  a  ■  \J n  ■  4>{myj2  +  16  •  ct  •  \/ 
+7.6  •  a  ■  s/fi  m)  •  n  •  h) 


<  p  ■  (j){m)  ■  a  ■  ■  (1.49  •  \Jh  ■  +  2.11  •  h)  +  2.77  •  ■  Vh 

■  (1.96  •  y/fim)  +  2.77  •  Vh)  +  4.62  •  n 

=  ^KS- 

Then  if  the  input  to  Switch  Key  has  noise  bounded  by  ly  then  the  output  noise  value  will  be  bounded  by 

,  ^KS  •  do 


Pi 


+  descale- 


Mult(c,  c'):  Combining  the  all  the  above,  if  we  take  two  ciphertexts  of  level  one  with  input  noise  bounded  by  v  and  o', 
the  output  noise  level  from  multiplication  will  be  bounded  by 


o 


n 


(-  +  Bscale)  ■  (-  +  Bscale)  +  —  +  ^scale- 

\Pl  )  \P\  )  Pi 
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G.4  Application  to  the  Offline  Phase 


In  all  of  our  protocols  we  will  only  be  evaluating  the  following  circuit:  We  first  add  n  ciphertexts  together  and  perform 
a  multiplication,  giving  a  ciphertext  with  respect  to  modulus  po  with  noise 


Ui 


tr  '  f^clean 
Pi 


-^scale 


^KS  •  PO 
P2 


B. 


scale • 


We  then  add  on  another  n  ciphertexts,  which  are  added  at  level  one  and  then  reduced  to  level  zero.  We  therefore  obtain 
a  final  upper  bound  on  the  noise  for  our  adversarially  generated  ciphertexts  of 

C/2  =  C/l+'^^^+Bsca,e. 

Pi 


To  ensure  valid  (distributed)  decryption,  we  require 

2  ■  U2  •  (1  +  <  pq, 

i.e.  we  take  B  =  1/2  in  our  distributed  decryption  protocol. 

This  ensure  valid  decryption  in  our  offline  phase,  however  we  still  need  to  select  the  parameters  to  ensure  security. 
Following  the  analysis  in  [16]  of  the  BGV  scheme  we  set,  for  128-bit  security, 

(p{m)  >  33.1  •  log  ■ 


Combining  the  various  inequalities  together;  a  search  of  the  parameter  space  the  fixed  values  of  ct  =  3.2,  sec  =  40 
and  h  =  64,  for  several  choices  of  p,  n  yields  the  estimates  in  tables  4,  5  and  6.  And  it  is  these  parameter  sizes  which 
we  use  to  generate  the  primes  and  rings  in  our  implementation. 


n 

4>{m) 

logs  Po 

logs  Pi 

loga  qi 

log2(t/2) 

2 

8192 

130 

104 

234 

89 

3 

8192 

132 

104 

236 

90 

4 

8192 

132 

104 

236 

91 

5 

8192 

132 

106 

238 

90 

6 

8192 

132 

106 

238 

91 

7 

8192 

132 

108 

240 

91 

8 

8192 

132 

108 

240 

91 

9 

8192 

132 

110 

242 

91 

10 

8192 

132 

110 

242 

91 

20 

8192 

134 

110 

244 

93 

50 

8192 

136 

114 

250 

94 

100 

8192 

136 

116 

252 

95 

Table  4.  Parameters  forp  «  2®^. 
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n 

loga  Po 

loga  Pi 

loga  91 

loga(t/2) 

2 

16384 

196 

136 

332 

154 

3 

16384 

196 

138 

334 

154 

4 

16384 

196 

140 

336 

155 

5 

16384 

196 

142 

338 

155 

6 

16384 

198 

140 

338 

156 

7 

16384 

198 

140 

338 

156 

8 

16384 

198 

140 

338 

157 

9 

16384 

198 

142 

340 

156 

10 

16384 

198 

142 

340 

156 

20 

16384 

198 

146 

344 

157 

50 

16384 

200 

148 

348 

158 

100 

16384 

202 

150 

352 

160 

Table  5.  Parameters  for  j3  «  2®'*. 


n 

0(m) 

logs  Po 

logs  Pi 

loga  9i 

loga(t/2) 

2 

32768 

324 

202 

526 

283 

3 

32768 

326 

202 

528 

285 

4 

32768 

326 

204 

530 

284 

5 

32768 

326 

204 

530 

285 

6 

32768 

326 

206 

532 

284 

7 

32768 

326 

206 

532 

285 

8 

32768 

326 

208 

534 

285 

9 

32768 

326 

208 

534 

285 

10 

32768 

326 

208 

534 

285 

20 

32768 

328 

210 

538 

286 

50 

32768 

330 

212 

542 

289 

100 

32768 

330 

216 

546 

288 

Table  6.  Parameters  for  p  «  2^^®. 
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Dishonest  Majority  Multi-Party  Computation  for  Binary  Circuits 
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Abstract.  We  extend  the  Tiny-OT  two  party  protocol  of  Nielsen  et  al  (CRYPTO  2012)  to  the  case 
of  n  parties  in  the  dishonest  majority  setting.  This  is  done  by  presenting  a  novel  way  of  transferring 
pairwise  authentications  into  global  authentications.  As  a  by  product  we  obtain  a  more  efficient  manner 
of  producing  globally  authenticated  shares,  which  in  turn  leads  to  a  more  efficient  two  party  protocol 
than  that  of  Nielsen  et  al. 


Keywords:  Secure  Multi-party  Computation,  Message  Authentication  Code,  Oblivious  Transfer 

1  Introduction 

In  recent  years  actively  secure  MPC  has  moved  from  a  theoretical  subject  into  one  which  is  becoming 
more  practical.  In  the  variants  of  multi-party  computation  which  are  based  on  secret  sharing  the 
major  performance  improvement  has  come  from  the  technique  of  authenticating  the  shared  data 
and/or  the  shares  themselves  using  information  theoretic  message  authentication  codes  (MACs). 
This  idea  has  been  used  in  a  number  of  works:  In  the  case  of  two-party  MPC  for  binary  circuits  in 
[13],  for  n-party  dishonest  majority  MPC  for  arithmetic  circuits  over  a  “largish”  finite  field  [4,7],  and 
for  n-party  dishonest  majority  MPC  over  binary  circuits  [8].  All  of  these  protocols  are  in  the  pre¬ 
processing  model,  in  which  the  parties  first  engage  in  a  function  and  input  independent  offline  phase. 
The  offline  phase  produces  various  pieces  of  data,  often  Beaver  style  [3]  “multiplication  triples”, 
which  are  then  consumed  in  the  online  phase  when  the  function  is  determined  and  evaluated. 

In  the  case  of  the  protocol  of  [13],  called  Tiny-OT  in  what  follows,  the  authors  use  the  technique 
of  applying  information  theoretic  MACs  to  the  oblivious  transfer  (OT)  based  GMW  protocol  [10]  in 
the  two  party  setting.  In  this  protocol  the  offline  phase  consists  of  producing  a  set  of  pre-processed 
random  OTs  which  have  been  authenticated.  The  offline  phase  is  then  executed  efficiently  using  a 
variant  of  the  OT  extension  protocol  of  [12].  Por  a  detailed  discussion  on  OT  extension  see  [2,12,13]. 
In  this  work  we  shall  take  OT  extension  as  a  given  sub-procedure. 

Por  the  case  of  n  party  protocols,  where  n  >  2,  there  are  three  main  techniques  using  such 
MACs.  In  [4]  each  share  of  a  given  secret  is  authenticated  by  pairwise  MACs,  i.e.  if  party  Pi  holds 
a  share  ai,  then  it  will  also  hold  a  MAC  Mij  for  every  j  ^  i,  and  party  Pj  will  hold  a  key  Kij. 
Then,  when  the  value  a*  is  made  public,  party  Pi  also  reveals  the  n  —  1  MAC  values,  that  are 
then  checked  by  other  parties  using  their  private  keys  Kij.  Note  that  each  pair  of  parties  holds  a 
separate  key/MAC  for  each  share  value.  In  [7]  the  authors  obtain  a  more  efficient  online  protocol  by 
replacing  the  MACs  from  [4]  with  global  MACs  which  authenticate  the  shared  values  a,  as  opposed 
to  the  shares  themselves.  The  authentication  is  also  done  with  respect  to  a  fixed  global  MAC  key 
(and  not  pairwise  and  data  dependent).  This  method  was  improved  in  [6],  where  it  is  shown  how 
to  verify  these  global  MACs  without  revealing  the  secret  global  key.  In  [8]  the  authors  adapt  the 
technique  from  [7]  for  the  case  of  small  finite  fields,  in  a  way  which  allows  one  to  authenticate 
multiple  field  elements  at  the  same  time,  without  requiring  multiple  MACs.  This  is  performed 
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using  a  novel  application  of  ideas  from  coding  theory,  and  results  in  a  reduced  overhead  for  the 
online  phase. 

One  can  think  of  the  Tiny-OT  protocol  as  applying  the  authentication  technique  of  [4]  to 
the  two  party,  binary  circuit  case,  with  a  pre-processing  which  is  based  on  OT  as  opposed  to 
semi-homomorphic  encryption.  For  two  party  protocols  over  binary  circuits  practical  experiments 
show  that  Tiny-OT  far  out-performs  other  protocols,  such  as  those  based  on  Yao’s  garbled  circuit 
technique.  This  is  because  of  the  performance  of  the  offline  phase  of  the  Tiny-OT  protocol.  Thus 
a  natural  question  is  to  ask,  whether  one  can  extend  the  Tiny-OT  protocol  to  the  n-party  setting 
for  binary  circuits. 


Results  and  Techniques  In  this  paper  we  mainly  address  ourselves  to  the  above  question,  i.e. 
how  can  we  generalize  the  two-party  protocol  from  [13]  to  the  n-party  setting? 

We  first  describe  what  are  the  key  technical  difficulties  we  need  to  overcome.  The  Tiny-OT 
protocol  at  its  heart  has  a  method  for  authenticating  random  bits  via  pairwise  MACs,  which  itself 
is  based  on  an  efficient  protocol  for  OT-extension.  In  [13]  this  protocol  is  called  a  Bit.  Our  aim  is 
to  use  this  efficient  two-party  process  as  a  black-box.  Unfortunately,  if  we  extend  this  procedure 
naively  to  the  three  party  case,  we  would  obtain  (for  example)  that  parties  Pi  and  P2  could  execute 
the  protocol  so  that  Pi  obtains  a  random  bit  and  a  MAC,  whilst  P2  obtains  a  key  for  the  MAC 
used  to  authenticate  the  random  bit.  However,  party  P3  obtains  no  authentication  on  the  random 
bit  obtained  by  Pi,  nor  does  it  obtain  any  information  as  to  the  MAC  or  the  key. 

To  overcome  this  difficulty,  we  present  a  protocol  in  which  we  fix  an  unknown  global  random  key 
and  where  each  party  holds  a  share  of  this  key.  Then  by  executing  the  pairwise  a  Bit  protocol,  we  are 
able  to  obtain  a  secret  shared  value,  as  well  as  a  shared  MAC,  by  all  n-parties.  This  resulting  MAC 
is  identical  to  the  MAC  used  in  the  SPDZ  protocol  from  [6].  This  allows  us  to  obtain  authenticated 
random  shares,  and  in  addition  to  permit  parties  to  enter  their  inputs  into  the  MFC  protocol. 

The  online  phase  will  then  follow  similarly  to  [6] ,  if  we  can  realize  a  protocol  to  produce  “mul¬ 
tiplication  triples”.  In  [13]  one  can  obtain  such  triples  by  utilizing  a  complex  method  to  produce 
authenticated  random  OTs  and  authenticated  random  ANDs  (called  aOTs  and  aANDs)^.  We  notice 
that  our  method  for  obtaining  authenticated  bits  also  enables  us  to  obtain  a  form  of  authenticated 
OTs  in  a  relatively  trivial  manner,  and  such  authenticated  OTs  can  be  used  directly  to  implement 
a  multiplication  gate  in  the  online  phase. 

Our  contribution  is  twofold.  First,  we  generalize  the  two-party  Tiny-OT  protocol  to  the  n-party 
setting,  using  a  new  technique  for  authentication  of  secret  shared  bits,  and  new  offline  and  online 
phases.  Thus  we  are  able  to  dispense  with  the  protocols  to  generate  aOTs  and  a  ANDs  from  [13]. 
Second,  and  as  a  by  product,  we  obtain  a  more  efficient  protocol  than  the  original  Tiny-OT  protocol, 
in  the  two  party  setting  when  one  measures  efficiency  in  terms  of  the  number  of  a  Bit’s  needed  per 
multiplication  gate. 

The  security  of  our  protocols  are  proven  in  the  standard  universal  composability  (UC)  frame¬ 
work  [5]  against  a  malicious  adversary  and  static  corruption  of  parties. 

We  end  this  introduction  by  describing  two  possible  extensions  to  our  work.  Firstly,  each  bit 
in  our  protocol  is  authenticated  by  an  element  in  a  finite  field  F2<^.  Whilst  such  values  are  never 
transmitted  in  our  online  phase  due  to  our  MACCheck  protocol,  they  do  provide  an  overhead  in  the 

^  In  fact  the  paper  [13]  does  not  produce  such  multiplication  triples,  but  they  follow  immediately  from  the  presen¬ 
tation  in  the  paper  and  would  result  in  a  more  efficient  online  phase  than  that  described  in  [13] 
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computation.  In  [8]  the  authors  show  how  to  reduce  this  overhead  using  coding  theory  techniques. 
It  would  be  interesting  to  see  how  such  techniques  could  be  applied  to  our  protocol,  and  what 
advantage  if  any  they  would  bring. 

Secondly,  our  protocol  requires  n  •  (n  —  l)/2  executions  of  the  aBit  protocol  from  [13].  Each 
pairwise  invocation  requires  the  execution  of  an  OT-extension  protocol,  and  hence  we  require 
O(n^)  such  OT-channels.  In  [11],  in  the  context  of  traditional  MFC  protocols,  the  authors  present 
techniques  and  situations  in  which  the  number  of  OT-channels  can  be  reduced  to  0{n).  It  would 
be  interesting  to  see  how  such  techniques  could  be  applied  in  practice  to  the  protocol  described  in 
this  paper. 

2  Notation 

In  this  section  we  settle  the  notation  used  throughout  the  paper.  We  use  k  to  denote  the  secu¬ 
rity  parameter.  We  let  negl(K)  denote  some  unspecified  function  /(k),  such  that  /  =  o{n~^)  for 
every  fixed  constant  c,  saying  that  such  a  function  is  negligible  in  k.  We  say  that  a  probability  is 
overwhelming  in  k  if  it  is  1  —  negl(K). 

We  consider  the  sets  {0, 1}  and  F2  endowed  with  the  structure  of  the  fields  F2  and  F2f!,  respec¬ 
tively.  Let  F  =  F2<^,  we  will  denote  elements  in  F  with  greek  letters  and  elements  in  F2  with  roman 
letters. 

We  will  additively  secret  share  bits  and  elements  in  F,  among  a  set  of  parties  V  =  {Pi, . . . ,  Pn}; 
and  sometimes  abuse  notation  identifying  subsets  X  C  {1, . . . ,  n}  with  the  subset  of  parties  indexed 
by  i  G  X.  We  write  (a)^  if  a  is  shared  amongst  the  set  X  =  {ii, . . .  ,it}  with  party  Pi-  holding  a 
value  ai-,  such  that  ~  ^^so,  if  an  element  x  G  ¥2  (resp.  (5  G  ¥)  is  additively  shared 

among  all  parties  we  write  (x)  (resp.  (/3)).  We  adopt  the  convention  that  if  a  G  F2  (resp.  /3  G  F) 
then  the  shares  a*  G  F2  (resp.  Pi  G  F). 

(Linear)  arithmetic  on  the  (•)^  sharings  can  be  performed  as  follows.  Given  two  sharings  (x)^^  = 
{xi-^i-^x^  and  (y)^^  =  {Vij^ij^Xy  we  can  compute  the  following  linear  operations 

a  -  {xf-  =  {a- Xi.}i.(zx,, 
a  +  {xf^  =  {a  -h  XiJ  U  {xi^}iyeip{ii}^ 

(x)^^  -I-  {y)^y  =  {x  + 

=  U  {yij}ij£Xy\i^  u  {xi-  +  yiy}ije:X^niy- 

Our  protocols  will  make  use  of  pseudo-random  functions,  which  we  will  denote  by  PRF^’*(-) 
where  for  a  key  s  and  input  m  G  {0, 1}*  the  pseudo-random  function  is  defined  by  PRF^’*(m)  G  X*, 
where  X  is  some  set  and  t  is  a  non-negative  integer. 

Authentication  of  Secret  Shared  Values.  As  described  in  the  introduction  the  literature  gives 
two  ways  to  authenticate  a  secret  globally  held  by  a  system  of  parties,  one  is  to  authenticate  the 
shares  of  each  party,  as  in  [4] ,  the  other  is  to  authenticate  the  secret  itself,  as  in  [7] .  In  addition  we 
can  also  have  authentication  in  a  pairwise  manner,  as  in  [4,13],  or  in  a  global  manner,  as  in  [7].  Both 
combinations  of  these  variants  can  be  applied,  but  each  implies  important  practical  differences,  e.g., 
the  total  amount  of  data  each  party  needs  to  store  and  how  checking  of  the  MAGs  is  performed. 
In  this  work  we  will  use  a  combination  of  different  techniques,  indeed  the  main  technical  trick  is  a 
method  to  pass  from  the  technique  used  in  [13]  to  the  technique  used  in  [7]. 
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Our  main  technique  for  authentication  of  secret  shared  bits  is  applied  by  placing  an  information 
theoretic  tag  (MAC)  on  the  shared  bit  x.  The  authenticating  key  is  a  random  line  in  F,  and  the 
MAC  on  X  is  its  corresponding  line  point,  thus,  the  linear  equation  ns{x)  =  i's{x)  +  x  ■  6  holds, 
for  some  iJ,s{x),ns{x),5  G  F.  We  will  use  these  lines  in  various  operations^,  for  various  values  of 
5.  In  particular,  there  will  be  a  special  value  of  5,  which  we  denote  by  a  and  assume  to  be  (a)^ 
shared,  which  represents  the  global  key  for  our  online  MFC  protocol.  This  will  be  the  same  key 
for  every  bit  that  needs  to  be  authenticated.  It  will  turn  out  that  for  the  key  a  we  always  have 
i^aix)  =  0.  By  abuse  of  notation  we  will  sometimes  refer  to  a  general  6  also  as  a  global  key,  and 
then  the  corresponding  i's{x),  is  called  the  local  key. 

Distinguishing  between  parties,  say  T,  that  can  reconstruct  bits  (together  with  the  line  point), 
and  those  parties,  say  ff ,  that  can  reconstruct  the  line  gives  a  natural  generalization  of  both  ways 
to  authenticate,  and  it  also  allows  to  move  easily  from  one  to  another.  We  write  [x^  j  if  there  exist 
frs{x),ns{x)  G  F  such  that: 

^s(x)  =  ns{x)  +  x-  5, 

where  we  have  that  x  and  ^^{x)  are  (•)^  shared,  and  ^^{x)  and  <5  are  {■)‘^  shared,  i.e.  there  are 
values  Xj,  //j,  and  dj,  such  that 

x  =  '^Xi,  ^5(x)=^//i,  iys{x)  =  J2’^j^ 

i&I  i&I  j&J  j&J 

Notice  that  ias{x)  and  ns{x)  depend  on  6  and  x:  we  can  fix  6  and  so  obtain  key-consistent  represen¬ 
tations  of  bits,  or  we  can  fix  x  and  obtain  different  key-dependant  representations  for  the  same  bit 
x.  To  ease  the  reading,  we  drop  the  sub-index  J  li  J  =  V ^  and,  also,  the  dependence  on  d  and  x 
when  it  is  clear  from  the  context.  We  note  that  in  the  case  of  Ix  =  Jx  then  we  can  assume  Vj  =  0. 

When  we  take  the  fixed  global  key  a  and  we  have  Xx  =  Jx  =  T",  we  simplify  notation  and  write 
[xj  =  [x]^p.  By  our  comment  above  we  can,  in  this  situation,  set  Vj  =  0  this  means  that  a  [xj 
sharing  is  given  by  two  sharings  ((x)^,  Notice  that  the  [[-J-representation  of  a  bit  x  implies 

that  X  is  authenticated  with  the  global  key  a  and  that  it  is  (•)-shared,  i.e.  its  value  is  actually 
unknown  to  the  parties. 

This  notation  does  not  quite  align  with  the  previous  secret  sharing  schemes  used  in  the  literature, 
but  it  is  useful  for  our  purposes.  For  example,  with  this  notation  the  MAC  scheme  of  [4]  is  one 
where  each  data  element  x  is  shared  via  [xifa.  j  sharings.  Thus  the  data  is  shared  via  a  (x)  sharing 
and  the  authentication  is  performed  via  [xff^.  j  sharings,  i.e.  we  are  using  two  sharing  schemes 
simultaneously.  In  [7]  the  data  is  shared  via  our  [xj  notation,  except  that  the  MAC  key  value  v  is 
set  equal  io  v  =  v'  j a,  where  u'  being  a  public  value,  as  opposed  to  a  shared  value.  Our  [xj  sharing 
is  however  identical  to  that  used  in  [6],  bar  the  differences  in  the  underlying  finite  fields. 

Looking  ahead  we  say  that  a  bit  [xj  is  partially  opened  if  (x)  is  opened,  i.e.  the  parties  reveal 
the  shares  of  x,  but  not  the  shares  of  the  MAC  value  //q(x). 

Arithmetic  on  [xj  Shared  Values.  Given  two  representations  [x]J^  =((x)^"',  {ns{x))^^,  {vs{x))'^^) 
and  =  (^<5(2/))'^^))  under  same  the  5,  the  parties  can  locally  compute  [x  -|- 

^  For  example,  we  will  also  use  lines  to  generate  OT-tuples,  i.e.  quadruples  of  authenticated  bits  which  satisfy  the 
algebraic  equation  for  a  random  OT. 

^  Otherwise  one  can  subtract  Uj  from  gj,  before  setting  Uj  to  zero. 
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+  (^5(y))^^  (*^<5(2:))'^"'  +  {V5{y))^y)  using  the  arithmetic  on  (•)^ 

sharings  above. 

Let  [xj  =[{x),  {y{x)))  and  [[y]]  =((y),  (;u(y)))  be  two  different  authenticated  bits.  Since  our 
sharings  are  linear,  as  well  as  the  MACs,  it  is  easy  to  see  that  the  parties  can  locally  perform  linear 
operations: 

H  +  M  =((a;)  +  (y),  (Kx))  +  {Ky)))=  lx  +  y} 

a  ■  lx}  ={a  ■  (x),  a  ■  (/i(x)))=  [a  •  x]], 
a  +  [xj  =(a  +  (x),  {n{a  +  x)))=  [a  +  xj. 

where  (/i(a  +  x))  is  the  sharing  obtained  by  each  party  i  £  V  holding  the  value  ai  ■  a  +  Hi{x). 

This  means  that  the  only  remaining  question  to  enable  MFC  on  [[-J-shared  values  is  how  to 
perform  multiplication  and  how  to  generate  the  [[-J-shared  values  in  the  first  place.  Note,  that  a 
party  Pi  that  wishes  to  enter  a  value  into  the  MFC  computation  is  wanting  to  obtain  a  [x]^P 
sharing  of  its  input  value  x,  and  that  this  is  a  [xj-representation  if  we  set  x*  =  x  and  Xj  =  0  for 

j  +  i- 

3  MPC  Protocol  for  Binary  Circuit 

We  start  presenting  a  high  level  view  of  the  protocols  that  allow  us  to  perform  multi-party  com¬ 
putation  for  binary  circuits.  We  assume  synchronous  communication  and  authentic  point-to-point 
channels.  Our  protocol  is  in  the  pre-processing  model  in  which  we  allow  a  function  (and  input) 
independent  pre-processing,  or  offline,  phase  which  produces  correlated  randomness.  This  enables 
a  lightweight  online  phase,  that  does  not  need  public-key  machinery. 

In  the  following  sections  we  will  describe  a 
protocol,  Honiine;  implementing  the  actual  func¬ 
tion  evaluation  in  the  (J^comm;  •^Prep)-hybrid 
model;  a  protocol,  ilprep,  implementing  the 

offline  phase  in  the  (J^Comm, •i^Bootstrap)-hybrid 

model;  and  a  novel  way  to  authenticate  bits  to 
more  than  two  parties,  which  takes  as  starting 
point  the  aBit  command  of  [13],  and  which  we 
model  with  the  .^Bootstrap  functionality. 

The  online  phase  implements  the  standard 
functionality  .Tbniine  (see  Appendix  C.2  for  de¬ 
tails).  It  is  based  on  the  K-]] -representation  of 
bits  described  in  Section  2,  and  it  is  very  simi¬ 
lar  to  the  online  phase  of  other  MFC  protocols 
[6,7,8,13].  We  compute  a  function  represented 
as  a  binary  circuit,  where  private  inputs  are 
additively  shared  among  the  parties,  and  cor¬ 
rectness  is  guaranteed  by  using  additive  secret 
sharings  of  linear  MACs  with  global  secret  key 
a.  For  simplicity  we  assume  one  single  input 
for  each  party  and  one  public  output.  The  on¬ 
line  protocol,  presented  in  Section  5,  uses  the 


Figure  1  Overview  of  Protocols  Enabling  MPC 
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linearity  of  the  [[•]] -sharings  to  perform  additions  and  scalar  multiplications  locally.  For  general 
multiplications  we  need  utilize  data  produced  during  the  offline  phase,  in  particular  the  output 
of  the  GaOT  (Global  authenticated  OT)  command  of  Section  6.  Refer  to  Figure  2  for  a  complete 
description  of  the  functionality  for  preprocessing  data.  The  aforementioned  command  GaOT  builds 
upon  ilBootstrap  protocol,  described  in  Section  4,  to  generate  random  authenticated  OTs  and,  as  we 
noted  above,  we  skip  the  less  efficient  procedures  of  [13] . 

The  Functionality  J-prep 
Let  A  be  the  set  of  indices  of  corrupt  parties. 

Initialize:  On  input  (Init)  from  honest  parties,  the  functionality  samples  random  ai  for  each  i  ^  A.  It  waits  for 
the  environment  to  input  corrupt  shares  {aj}j^A  If  any  j  £  A  outputs  abort,  then  the  functionality  aborts 
and  returns  the  set  ot  j  £  A  which  returned  abort.  Otherwise  the  functionality  sets  a  =  ai  +  •  •  •  +  q;„,  and 
outputs  Ufc  to  honest  Pk- 

Share:  On  input  (i,a;.  Share)  from  party  Pi,  and  (i.  Share)  from  all  other  parties.  The  functionality  produces 
an  authentication  Jx]]  =  {{x},  {l-i})-  It  sets  Xj  =  0  it  j  i.  Also,  the  MAC  might  be  shifted  by  a  value  Ah, 
i.e.  jj,  =  X  ■  a  +  Ah,  where  Ah  is  an  F2-linear  combination  of  {ak}k^A  not  known  to  the  environment.  It 
proceeds  as  follows: 

-  Set  i-i  =  X  ■  a.  It  i  £  A,  the  environment  specifies  x. 

-  Wait  for  the  environment  to  specify  MAC  shares  {/ijjjgA,  and  generate  (p)  where  the  portion  of  honest 
shares  is  consistent  with  the  adversarial  shares,  but  otherwise  random. 

-  Set  Xk  =  0  it  k  ^  i,  k  ^  A.  It  the  environment  inputs  shift-Pk  set  fik  =  jXk  +  Ok- 

-  Output  {xk,k‘‘k)  to  honest  Pk 

GaOT:  On  input  (GaOT)  from  the  parties,  the  functionality  waits  for  the  environment  to  input  “Abort”  or 
“Continue”.  If  it  is  told  to  abort,  it  outputs  the  special  symbol  0  to  all  parties. 

Otherwise  it  samples  three  random  bits  e,  xo,  xi,  and  sets  z  =  Xe-  Then,  for  every  bit  y  £  {e,  2,  xo,  x\}  the 
functionality  produces  an  authentication  Jy]  =  ((y),  (p(y))),  but  let  the  environment  to  specify  shares  for 
corrupt  Pj.  It  proceeds  as  follows: 

-  Set  y(y)  =  y  •  a. 

-  Wait  for  the  environment  to  input  bit  shares  {yj}j^A,  and  MAC  shares  {yj}j^A,  and  creates  sharings 
(y) ,  (fi)  where  the  portion  of  honest  shares  is  consistent  with  adversarial  shares. 

-  Output  {yk,fJ,k)  to  honest  Pk- 

Figure  2  Ideal  Preprocessing 


Notice  that,  as  in  [6,7,8,13],  during  the  online  computation  of  the  circuit  we  do  not  know 
if  we  are  working  with  the  correct  values,  since  we  do  not  check  the  MAGs  of  partially  opened 
values  during  the  computation.  This  check  is  postponed  to  the  end  of  the  protocol,  where  we  call 
the  MACCheck  procedure  as  in  [6]  (see  Appendix  B  for  details).  Note  this  procedure  enables  the 
checking  of  multiple  sets  of  values  partially  opened  during  the  computation  without  revealing  the 
global  secret  key  a,  thus  our  MPG  protocol  can  implement  reactive  functionalities. 

The  MAG  checking  protocol  is  called  in  both  the  offline  and  the  online  phases,  it  requires  access 
to  an  ideal  functionality  for  commitments  .Tcomm)  also  given  in  Appendix  B,  and  it  is  not  intended 
to  implement  any  functionality.  Also,  note  that  the  algebraic  correctness  of  the  output  of  the  GaOT 
command  in  the  offline  phase  is  checked  in  the  offline  phase  and  not  in  the  online  phase. 

4  Prom  Tiny-OT  aBit’s  to  l-J-Sharings 

At  the  heart  of  our  MPG  protocol  is  a  method  to  translate  from  the  two  party  a  Bits  produced  by 
the  offline  phase  of  the  Tiny-OT  protocol  in  [13],  to  the  [[-J-sharings  under  some  global  shared  key 
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a  from  Section  2.  We  note  that  the  protocol  to  produce  aBit’s  is  the  only  sub-protocol  from  [13] 
which  we  use  in  this  paper,  and  thus  the  more  complex  protocols  in  [13]  for  producing  aOT’s  and 
aAND’s  we  discard.  We  first  deal  with  the  underlying  two  party  sub-protocols,  and  then  we  use 
these  to  define  our  multi-party  protocols. 

4.1  Two-party  [•]-representations. 

Thus  throughout  we  assume  access  to  an  ideal  functionality  .T^aBit,  given  in  Figure  3,  that  produces 
a  substantially  unbounded  number  of  (oblivious)  authenticated  random  bits  for  two  parties,  under 
some  randomly  chosen  key  5j  known  by  one  of  the  parties.  This  functionality  can  be  implemented 
assuming  a  functionality  J-qj  and  using  OT-extension  techniques  as  in  [13].  For  ease  of  exposition 
we  present  the  functionality  as  returning  single  bits  for  single  requests.  In  practice  the  functionality 
is  implemented  via  OT-extension  and  so  one  is  able  to  obtain  many  a  Bits  on  each  invocation  of  the 
functionality,  for  a  given  value  of  5j.  Adapting  our  protocols  to  deal  with  multiple  a  Bit  production 
for  a  single  random  fixed  5j  chosen  by  the  functionality  is  left  to  the  reader^. 

The  Functionality  JAiBit 

Authenticated  Pj):  This  functionality  selects  a  random  (5j  £  F  and  a  random  bit  r,  and  returns  a 

sharing  [r]s.j. 

-  On  input  (aBit,  i,j)  from  honest  Pi  and  Pj,  the  functionality  samples  a  random  Sj  and  a  random  sharing 

=  (^>  Mil  such  that  fn  —  Uj  +  r  ■  Sj.  It  then  outputs  {r,  fa}  to  Pi  and  {dj,  Vj}  to  Pj. 

-  If  Pi  is  corrupted,  the  functionality  waits  for  the  environment  to  input  the  pair  {r,  y,i}  and  it  sets 
Vj  =  jj,i  +  r  ■  Sj  for  some  randomly  chosen  Sj,  and  {dj,  Vj}  is  returned  to  party  Pj. 

-  If  Pj  is  corrupted,  the  functionality  waits  for  the  environment  to  input  the  pair  {dj,  Vj},  r  is  selected  at 
random  and  m  is  set  to  be  —  r  •  Sj.  The  pair  {r,yi}  is  returned  to  party  Pi. 

Figure  3  Two-party  Bit  Authentication  [13] 


Using  the  protocol  Il2-Share)  described  in  Protocol  4,  we  can  obtain  a  “two-party”  representation 
[r]\,  j  of  a  random  bit  known  to  Pj,  under  the  key  chosen  by  Pj.  This  extension  is  needed  because 
we  need  to  adapt  the  aBit  command  to  the  multi-party  case.  For  example,  if  two  parties,  Pj  and 
Pj,  run  the  command  aBit(z,j),  they  obtain  a  random  Hy  j,  with  respect  to  dj]  when  Pj  calls 

aBit(/c,j)  with  a  different  party  Pk^k  ^  j,  then  they  obtain  a  random  [s]|  .,  with  a  different  5j. 
Thus  allowing  the  parties  to  select  their  own  values  of  5j  means  that  we  can  obtain  key-consistent 
[•]-representations,  in  which  each  party  Pj  use  the  same  fixed  6j.  The  security  of  the  protocol 
.^2-Share  follows  from  the  Security  of  the  original  aBit  in  [13]:  intuitively  the  changes  required  to 
obtain  a  consistent  [•]-representation  do  not  compromise  security,  because  Sj  is  one-time-padded 
with  the  random  d'  produced  by  .Fa Bit-  See  C.l  for  details. 

Notice  that  the  command  2-Share  takes  Sj  as  the  input  of  Pj.  In  particular  the  value  Sj  may 
not  be  used  to  authenticate  bits.  Thus  we  could  use  the  protocol  il2-Share  to  obtain  a  sharing  of 
the  scalar  product  r  ■  5j,  where  Pj  obtains  the  random  bit  r,  and  the  other  party  decides  what  field 
element  5j  G  F  gets  multiplied  in.  Then  party  Pi  obtains  the  result  pi  masked  by  a  one-time  pad 

*  Note,  that  in  this  situation  we  (say)  produce  1,  000,  000  a  Bits  per  invocation  with  a  fixed  random  value  of  Sj,  then 
on  the  next  invocation  we  obtain  another  1,  000, 000  aBits  but  with  a  new  random  Sj  value.  This  is  not  explicit  in 
the  ideal  functionality  description  of  aBit  presented  in  [13],  but  is  implied  by  their  protocol. 
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The  Subprotocol  il2-Share 

2Share(i,  j;  (5j):  On  input  (2-Share,  i,  j,  5i),  where  Pj  has  Sj  £  F  as  input,  this  command  produces  a  [r]s^j 
sharing  of  a  random  bit  r. 

1.  Pi  and  Pj  call  on  input  (aBit,  i,  j):  The  box  samples  a  random  Sj  and  then  produces 

[r]s'.,j  = 

3 

such  that  fi'i  =  Vj  +  r  ■  Sj,  and  outputs  {r,  /r(}  to  Pi  and  {Sj,  Vj}  to  Pj. 

2.  Pj  computes  aj  =  Sj  +  S'j  and  sends  aj  to  party  Pi. 

3.  Pi  sets  fj.i  =  fi'i  +  r  ■  aj  =  Uj  +  r  ■  Sj. 


Protocol  4  Switching  to  Fixed  (5-shares 


value  p'j  known  only  to  Pj.  This  application  of  the  subprotocol  772-Share  is  going  to  be  crucial  in 
our  method  to  obtain  authenticated  OT’s  in  our  pre-processing  phase.  As  a  consequence  we  do  not 
always  see  5j  as  an  authentication  key. 


4.2  Multiparty  [•] -representation 


The  Functionality  ^Bootstrap 

Let  A  be  the  indices  of  corrupt  parties. 

Initialize:  On  input  (Init)  from  honest  parties,  the  functionality  activates  and  waits  for  the  environment  to 
input  a  set  of  shares  {5j}j^A.  It  samples  random  (5  £  F  and  prepares  sharing  (5),  where  the  portions  of 
honest  shares  are  consistent  with  the  adversarial  shares,  but  otherwise  random.  If  any  j  €  A  outputs  abort, 
then  the  functionality  aborts  and  returns  the  set  of  j  £  4  which  returned  abort,  otherwise  it  continues. 

Share:  On  input  (i,  a:.  Share)  from  party  Pi,  and  (i.  Share)  from  all  other  parties.  The  functionality  produces  a 
representation  [a;]^  =  ((a;)*,  {yY,  {n)'’^),  except  that  u  might  be  shifted  by  a  value  Ah,  i.e.  y  =  x-5  +  u  +  Ah, 
where  Ah  is  an  F2-linear  combination  of  {5k}k^A,  which  is  not  known  to  the  environment.  It  proceeds  as 
follows: 

-  It  samples  random  /a  £  F.  If  i  £  4  waits  for  the  environment  to  input  {y,x). 

-  The  functionality  sets  n  =  x  ■  S  +  y. 

-  The  functionality  waits  for  the  environment  to  input  shares  {i'j}j^A,  and  prepares  sharing  consistent 
with  the  adversarial  shares.  The  portion  of  honest  shares  are  otherwise  random. 

-  If  the  environment  inputs  shift-Pk,  the  functionality  sets  Vk  =  i^k  +  Sk,  k  ^  A. 

-  It  outputs  {uk,Sk)  to  honest  Pk. 

Figure  5  Ideal  Generation  of  [•]j  ^-representations 


Here  we  show  how  to  generalize  the  772-Share  protocol  in  order  to  obtain  an  n-party  representation 
[x\\  of  a  bit  X  chosen  by  Pj.  This  is  what  the  functionality  .^Bootstrap  models  in  Figure  5.  It  bootstraps 
from  a  two  party  authentication  to  a  multi-party  authentication  of  the  shared  bit.  As  before  for 
772-Share5  we  Can  see  the  outputs  of  .^Bootstrap  as  the  shares  of  scalar  products  x  ■  S,  where  one  party 
Pi  chooses  the  scalar  (bit)  x,  but  now  the  field  element  5  is  unknown  and  additively  shared  among 
all  the  parties.  An  interesting  feature  of  this  functionality  is  that  the  adversary  can  only  influence 
honest  outputs  in  a  small  way,  that  we  model  with  the  shift-Pk  flag.  Additionally,  we  can  not  prevent 
corrupt  parties  from  outputting  what  they  wish,  this  is  reflected  on  the  fact  that  the  functionality 
leaves  their  outputs  undefined.  The  main  difference  between  this  functionality  and  the  equivalent  in 

338 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


the  SPDZ  protocol  [7],  is  that  in  [7]  the  functionality  takes  as  input  an  offset  known  to  the  adversary 
who  adjusts  his  shares  to  obtain  an  invalid  MAC  value  by  this  linear  amount.  We  do  not  model 
this  in  our  functionality,  instead  we  allow  the  adversary  to  choose  his  shares  arbitrarily  (which 
obtains  the  same  effect).  However,  in  our  protocol  the  adversary  can  also  introduce  an  unknown  (to 
the  adversary)  error  into  the  MAC  values.  In  particular  the  adversary  can  decide  whether  to  shift 
honest  shares,  but  he  cannot  choose  the  shifting,  namely,  an  element  on  the  F2-span  of  secrets  6^  of 
honest  parties  Pk-  Later,  we  manage  to  determine  whether  there  are  any  errors  (both  adversarially 
known  and  unknown  ones)  using  an  information-theoretic  MACCheck  procedure  that  we  borrow 
from  [6].  See  Appendix  B  for  details. 

The  protocol  ^Bootstrap)  described  in  Protocol  6,  realizes  the  ideal  functionality  .T^sootstrap  in  a 
hybrid  model  in  which  we  are  given  access  to  .TaBit-  It  permits  to  obtain  and  it  is  implemented 
by  sending  to  each  Pj,j  ^  i,  a  mask  of  x  using  the  random  bits  given  by  2-Share(i,  hj)  as 
paddings,  and  then  allowing  Pj  to  adjust  his  share  to  the  right  value.  In  total  the  protocol  needs 
to  execute  n  —  1  a  Bit  per  scalar  product. 

The  Protocol  /^Bootstrap 

Initialize:  Each  party  Pi  samples  a  random  Si.  Define  5  =  +  •  •  •  +  S„. 

Share:  On  input  {i,  x,  Share)  from  Pi  and  {i,  Share)  from  all  other  parties,  do: 

1.  For  each  j  7^  i,  call  Il2-Share  with  2-Share(i,  j;  Sj).  Party  Pi  obtains  {vij,  fiij}j^i  whilst  party  Pj  obtains 

Vij,  such  that  +  ri,j  ■  Sj. 

2.  Party  Pi  samples  e  at  random  and  sets  /ii  =  e  +  ffj^i  S'Ud  fi  =  e  +  x  ■  Si. 

3.  Party  Pi  sends  dj  =  x  +  Vij  to  party  Pj  for  all  j  7^  i. 

4.  For  j  7^  i,  Pj  sets  Uj  =  Vij  +  dj  ■  Sj. 

5.  Output  Ci,  Si)  to  Pi  and  (vj,  Sj)  to  party  Pj,  for  j  i.  The  system  now  has  [x]}. 

Protocol  6  Transforming  Two-party  Representations  onto  ^-representations 


Lemma  1.  In  the  iFaB\t-hybrid  model,  the  protocol  ilBootstrap  implements  .^'Bootstrap  with  perfect 
security  against  any  static  adversary  corrupting  up  to  n  —  1  parties. 

Proof.  See  Appendix  C.l. 

5  The  Online  Phase 

In  this  section  we  present  the  protocol  iloniine;  described  in  Protocol  7,  which  implements  the 
online  functionality  in  the  (.Txomm)  •T'prep)-hybrid  model.  The  basic  idea  behind  our  online  phase  is 
to  use  the  set  of  GaOTs  output  in  the  offline  phase  to  evaluate  each  multiplication  gate.  To  see  how 
this  is  done,  consider  that  we  want  to  multiply  two  authenticated  bits  [aj,  [[6]].  The  parties  take  a 
GaOT  tuple  {[[e]],  [[zj,  [[xq]],  [xi]]}  off  the  pre-computed  list.  Recall  we  have  for  such  tuples  z  =  Xg. 
It  is  then  relatively  straightforward  to  compute  authenticated  shares  of  [[cj,  where  c  =  a  ■  b,  as 
follows:  First,  the  parties  partially  open  [/]]  =  lb}  +  [ej  and  [[(/]]  =  [[xq]]  +  [xi]]  +  [aj,  and  then  set 
[cj  =  [xo]]  +  /  •  M  +  •  [ej  +  11^:1.  To  see  why  this  is  correct,  note  that  since,  Xg  +  xo  +  e-  (xq  +  xi)  =  0, 
we  have  c  =  xq  +  (6  +  e)  •  a  +  (xq  +  xi  +  a)  •  e  +  2:  =  a  •  6. 

Theorem  1.  In  the  (Pcomm,  Pprep)-hybrid  model,  the  protocol  Honiine  securely  implements  .Tbniine 
against  any  static  adversary  corrupting  up  to  n  —  1  parties,  assuming  protocol  MACCheck  utilizes  a 
secure  pseudo-random  function  PRFg’*(-). 
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Protocol  iTonline 

Initialize:  The  parties  call  I  nit  on  the  J-pmp  functionality  to  get  the  shares  Oi  of  the  global  MAC  key  a.  If 
J-Prep  aborts  outputting  a  set  of  corrupted  parties,  then  the  protocol  returns  this  subset  of  A.  Otherwise  the 
operations  specihed  below  are  performed  according  to  the  circuit. 

Input:  To  share  his  input  bit  x,  Pi  calls  J-prep  with  input  (i,  a:,  Share)  and  party  Pj  for  i  7^  j  calls  J-prep  with 
input  (i,  Share).  The  parties  obtain  Ja;]]  where  the  a;-share  of  Pj  is  set  to  zero  if  j  7^  i. 

Add:  On  input  (Ja],  |[6]|),  the  parties  locally  compute  Ja  +  6]  =  Ja]  +  |[6]]. 

Multiply:  On  input  (|[al,|[l>l),  the  parties  call  J-prep  on  input  (GaOT),  obtaining  a  random  GaOT  tuple 
{Je],  |[2|,  |[a;o],  |a;i|}.  The  parties  then  perform: 

1.  The  parties  locally  compute  |[/|  =  |[6]]  +  |[e|  and  |[p|  =  |[a;oI  +  |[a;i|  +  |[a|. 

2.  The  shares  |[/|  and  |[p|  are  partially  opened. 

3.  The  parties  locally  compute 

M  =  II^oI  +f-l<4+9-M  +  14- 

Output:  This  procedure  is  entered  once  the  parties  have  finished  the  circuit  evaluation,  but  still  the  final 
output  Jy]  has  not  been  opened. 

1.  The  parties  call  the  protocol  iliviACCheck  on  input  of  all  the  partially  opened  values  so  far.  If  it  fails,  they 
output  0  and  abort.  0  represents  the  fact  that  the  corrupted  parties  remain  undetected  in  this  case. 

2.  The  parties  partially  open  Jj/]  and  call  ilMACCheck  on  input  y  to  verify  its  MAC.  If  the  check  fails,  they 
output  0  and  abort,  otherwise  they  accept  y  as  a.  valid  output. 


Protocol  7  Secure  Function  Evaluation  in  the  ,  jFprep-hybrid  Model 


Proof.  See  Appendix  C.2. 


6  The  Offline  Phase 

Here  we  present  our  offline  protocol  Hprep  (Protocol  8).  The  key  part  of  this  protocol  is  the  GaOT 
command.  In  [13]  the  authors  give  a  two-party  protocol  to  enable  one  party,  say  A,  to  obtain  two 
authenticated  bits  e,z,  and  the  other  party,  say  B,  to  obtain  two  authenticated  secret  bits  xq, 
xi,  such  that  z  =  Xe  and  e,xo  and  xi  are  chosen  at  random.  We  generalize  such  a  procedure  to 
many  parties  and  we  obtain  sharings  [[e]|,  [[zj,  [[xq]],  [[xi]|,  subject  to  z  =  Xg.  Notice  that  the  values 
e,  z,  xq,  xi  are  not  known  so  they  can  be  used  in  the  online  phase  to  implement  multiplication  gates. 

The  idea  behind  the  GaOT  command  it  is  to  exploit  the  relation  between  “affine  functions”  and 
“selector  functions”,  in  which  a  bit  e  selects  one  of  two  elements  (aoj  Xi)  ™  This  connection  was 
already  noted  in  [1]  on  the  context  of  garbling  arithmetic  circuits  via  randomized  encodings.  Thus, 
on  one  hand  we  have  authentications,  that  are  essentially  evaluations  of  affine  functions,  and  on  the 
other  we  have  OT  quadruples,  that  can  be  seen  as  selectors.  Seeing  both  as  the  same  object  means 
that  a  way  to  authenticate  bits  also  gives  us  a  way  to  generate  OTs,  and  the  other  way  around. 
The  procedure  is  broken  into  three  steps.  Share  OT,  Authenticate  OT  and  Sacrifice  OT.  We 
examine  these  three  stages  in  turn. 

To  produce  bit  quadruples  (e,  z,  xq,  xi),  such  that  z  =  Xg,  the  parties  will  use  a  (secret)  affine 
line  in  F  parametrized  by  {'&,'!]).  Note  that  with  our  functionality  ./^Bootstrap  we  get  where  e*  is 
known  to  Pi,  and  an  additive  sharing  (rj)  is  held  by  the  system.  We  denote  this  concrete  execution 
of  the  functionality  as  .T'Bootstrap(??))  since  we  shall  use  fresh  copies  of  .^Bootstrap  to  generate  more  OT 
quadruples  and  also  for  authentication  purposes.  Note,  that  rj  is  not  an  input  to  the  functionality  but 
a  shared  random  value  produced  when  initialising  the  functionality.  Now,  performing  n  independent 
queries  of  Share  command  on  this  copy  ./^Bootstrap (??))  the  parties  can  generate 

[e]^  =  +  •  •  •  +  [enTy  (1) 


340 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


The  Protocol  Tlprep 

Let  A  be  the  set  of  indices  of  corrupt  parties. 

Initialize:  On  input  (Init)  from  honest  parties  and  adversary,  the  system  runs  a  copy  of  JpBootstrap  which  is  de¬ 
noted  JpBootstrap(a).  Then  it  calls  Init  on  JpBootstrap(Q!).  If  .I^Bootstrap(Q!)  aborts,  outputting  a  set  of  corrupted  par¬ 
ties,  then  the  protocol  returns  this  subset  of  A  and  aborts.  Otherwise,  the  values  Si  returned  by  JpBootstrap(a) 

are  labelled  as  ai.  Set  a  =  Oi  -I-  •  •  •  -I-  and  output  Oi  to  honest  parties  Pi. 

Share:  On  input  (i,  x,  Share)  from  party  i  and  {j,  Share)  from  all  parties  j  7^  i.  The  protocol  calls  Share  command 
of  .7^Bootstrap(Q)  to  obtalu  [x\a,  giveu  by  {(/r)',  Then,  for  j  7^  i,  party  Pj  sets  his  share  of  x  to  be  zero, 

and  iJ.j{x)  =  Vj.  Party  Pi  sets  Hi{x)  =  fi  +  Vi.  Thus,  the  parties  obtain  Jx]. 

GaOT:  On  input  (GaOT)  from  all  Pi,  execute  the  following  sub-procedures: 

Share  OT.  This  generates  sharings  ((e),  (z),  (xo),  {xi})  such  that  xo,  xi  and  e  are  random  bits.  If  all  parties 
are  honest  then  it  holds  z  =  Xg. 

1.  The  system  runs  a  fresh  copy  of  JpBootstrap  on  Init  command  getting  an  additive  sharing  (rj)  for  some 
random  €  F.  Denote  this  copy  as  J^Bootstrap(»?). 

2.  Each  party  samples  a  random  bit  d.  Define  e  =  ei  -I-  •  •  •  -I-  e„. 

3.  For  each  i  =  1, . . .  ,n,  the  system  calls  JEBootstrap(t/)  on  input  (i,ei.  Share)  from  party  Pi  and  input 

(i.  Share)  from  any  other  Pj,  to  obtain  [ei]),.  That  is,  (in  an  honest  execution)  Pi  gets  ("i  €  F,  and 
the  parties  gets  an  additive  sharing  (r9i)  of  some  unknown  r^i  G  F,  such  that  Ci  =  +  Cr  ■  V-  The 

parties  compute  [e]^  =  [ei]],  -I-  •  •  •  -I-  [e„])). 

4.  At  this  point  of  the  protocol,  the  system  holds  sharings  (e),  {0,  {'&},  {r;},  so  it  can  derive  (xo)  =  {'&}, 

and  (xi)  =  {'&)  -h  (r/).  Note  that  (for  an  honest  execution)  ^  -|-  e  •  r;,  or  in  other  words  C)  =  Xe- 

5.  Each  party  Pi  sets  Zi,  xo,i,  xi,i  to  be  the  least  significant  bits  of  (i,  xo,i,  Xi,i  respectively,  so  as  to 
obtain  sharings  (z),  (xo)  and  (xi). 

Authenticate  OT.  This  step  produces  authentications  on  the  bits  previously  computed. 

For  every  bit  y  £  {e,«,xo,xi}  it  does  the  following: 

6.  Call  JFBootstrap(Q;)  on  input  (i,  2/i,  Share)  from  Pi  and  (_),  Share)  for  party  Pj  to  obtain 

7.  Compute  |[r/I|  by  forming  s-nd  then  subtracting  u{y)  from  fi{y). 

Sacrifice  OT.  This  step  checks  that  the  authenticated  OT-quadruples  are  correct.  Let  Je],  Ja]],  |[xo]|,  |[xi|, 
be  the  quadruple  to  check,  and  k  a  security  parameter: 

8.  Every  party  Pi  samples  a  seed  Si  and  asks  J-Qomm  to  broadcast  n  =  Comm(si). 

9.  Every  party  Pi  calls  Pcomm  with  Open(ri)  and  all  parties  obtain  Sj  for  all  j.  Set  s  =  si  -|-  •  •  •  -|-  s„. 

10.  Parties  sample  a  random  vector  t  =  PRFf^’'‘(0)  G  F^.  Note  all  parties  obtain  the  same  vector  as 
they  have  agreed  on  the  seed  s. 

11.  For  i  =  repeat  the  following: 

-  Take  one  fresh  quadruple  \ei\,  |2i|,  |[xo,i])  [[a^i.ili  and  partially  open  the  values 
Pi=U-  (|xol  -I-  |[xi]|)  -I-  |[xo,il  -h  |[xi,i]|  and  g*  =  |[e]|  -h  lei]. 

-  Locally  evaluate  d  such  that 

Icil  =  ti  •  (M  -h  |[xol)  -h  [Zill  -h  Jxo.il  -hPi  •  lel  -h  qi  ■  (|[xo,iI  -h  [[xqil), 

and  check  it  partially  opens  to  zero.  If  it  does  not,  then  abort. 

12.  The  parties  call  ilMACCheck  on  the  values  partially  opened  in  step  11. 

13.  If  no  abort  occurs,  output  |[e]|,  |[2]|,  |[xo],  |[xi]|  as  a  valid  quadruple. 


Protocol  8  Preprocessing:  Input  Sharing  and  Creation  of  OT  Quadruples  in  the  ^^Bootstrap-hybrid  Model 


Thus,  the  system  obtains  two  (secret)  elements  (e),  {(),  such  that  C  =  +  e  ■  r],  for  line  ((i?),  (rj)). 

Define  Xo  =  ^  and  xi  =  so  it  holds  C  =  Xe-  The  quadruple  (e,  z,  xq,  xi)  is  then  given  by  the 

least  significant  bits  of  the  corresponding  field  elements  (e,  <C,  xot  Xi)-  This  conclude  the  Share  OT 
step. 

To  add  MACs  to  each  bit  of  the  quadruple  that  the  parties  just  generated,  the  protocol  uses  the 
•^Bootstrap (a)  instance  to  obtain  a  sharing  (a)  of  the  global  key.  Each  party  can  now  authenticate 
his  shares  of  (e,  z,  xq,  xi)  querying  Share  command  and  obtaining  [ej,  [[zj,  [[xq]],  [xi]].  We  emphasize 
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that  the  same  a  is  used  to  authenticate  all  OT  quadruples,  thus  .^Bootstrap(o)  is  fixed  once  and  for 
all. 

After  the  Authenticate  OT  step  the  parties  have  sharings  [ej,  [zj,  [[xq]],  [xi]],  which  could 
suffer  from  two  possible  errors  induced  by  the  corrupted  parties:  Firstly  the  algebraic  equation 
z  =  Xe  may  not  hold,  and  second  the  MAC  values  may  be  inconsistent.  For  the  latter  problem 
we  will  check  all  the  partially  opened  values  using  the  MACCheck  procedure  at  the  end  of  the 
offline  phase.  For  the  former  case  we  use  the  Sacrifice  OT  step.  We  use  the  same  methodology 
as  in  [4,7,6],  i.e.  one  quadruple  is  checked  by  “sacrificing”  another  quadruple.  The  idea  involving 
sacrificing  can  be  seen  as  follows:  We  associate  to  each  pair  of  quadruples  a  polynomial  S{t)  over 
the  field  of  secrets  (F2  in  our  case),  which  is  the  zero  polynomial  only  if  both  quadruples  are  correct. 
Thus,  proving  correctness  of  quadruples  is  equivalent  to  proving  that  S{t)  is  the  zero  polynomial. 
This  is  done  by  securely  evaluating  S{t)  on  a  random  public  challenge  bit  t  via  a  combination 
of  addition  gates  and  two  openings  (plus  one  extra  opening  to  check  the  evaluation),  and  then 
checking  that  the  result  of  the  evaluation  partially  opens  to  zero.  In  this  way  we  would  waste  n 
quadruples  to  check  one  quadruple,  to  get  security  of  2“'^;  we  refer  the  reader  to  Appendix  A  for 
a  more  efficient  sacrifice  procedure. 

Theorem  2.  Let  k  be  the  seeurity  parameter  and  t  gN.  In  the  {tFcomm,  d^Bootstrap) -hybrid  model,  the 
protoeol  ilprep  seeurely  implements  J^Prep  with  statistieal  seeurity  on  n  against  any  statie  adversary 
eorrupting  up  to  n  —  1  parties,  assuming  the  existenee  of  PRF^’™'(-)  with  domain  A  =  F  {resp.  F2) 
and  m  =  t  (resp.  k). 

Proof.  See  Appendix  C.3 

7  Efficiency  Analysis 

As  it  stands  our  protocol  is  not  that  efficient,  mainly  due  to  the  naive  sacrificing  step  performed  in 
the  offline  phase  so  as  to  check  the  GaOTs  for  correctness.  In  Appendix  A  we  present  a  much  more 
efficient  sacrifice  step,  which  for  reasonable  parameters  means  that  the  ratio  of  required  GaOT’s  for 
each  used  one  can  be  between  four  and  six.  Let  this  ratio  be  denoted  r. 

We  examine  the  cost  of  a  multiplication  in  terms  of  the  number  of  a  Bits  required  in  the  case  of 
two  parties.  We  notice  that  each  GaOT  requires  us  to  consume  ten  a  Bits;  we  need  to  execute  the 
Share  OT  step  to  determine  e,z,xo,xi  (which  requires  one  aBit  consumption  per  player,  i.e.  two 
in  total  when  n  =  2);  in  addition  each  of  these  four  bits  needs  to  be  authenticated  in  Authenticate 
OT  in  Protocol  8  (which  again  requires  one  aBit  consumption  per  player,  i.e.  eight  in  total  when 
n  =  2). 

Since  we  need  one  checked  GaOT  to  perform  a  secure  multiplication,  and  we  sacrifice  r  —  1  GaOT 
to  obtain  a  checked  one;  this  means  we  require  r  •  10  a  Bits  per  secure  multiplication  in  the  two  party 
case.  Depending  on  the  parameters  we  use  for  our  sacrifice  step  in  Appendix  A,  this  equates  to  40, 
50  or  60  a  Bits  per  secure  multiplication. 

We  now  compare  this  to  the  number  of  aBits  needed  in  the  Tiny-OT  protocol  [13].  In  this 
protocol  each  secure  multiplication  requires  two  aBits,  two  aANDs  and  two  aOTs.  Assuming  a 
bucket  size  of  four  in  the  protocols  to  generate  aANDs  and  aOTs;  each  a  AND  (resp.  aOT)  requires 
four  LaANDs  (resp  LaOTs).  Each  La  AND  requires  four  aBits  and  each  LaOT  requires  three  aBits. 
Thus  the  total  number  of  aBits  per  secure  multiplication  is  2  •  (1  +  4  •  4  +  4  •  3)  =  2  •  29  =  58.  We  see 
therefore  that  we  can  make  our  protocol  (in  the  two  party  case)  more  efficient  than  the  Tiny-OT 
protocol,  when  we  measure  efficiency  in  terms  of  the  number  of  aBits  consumed. 
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A  Batching  the  Sacrifice  Step 

This  technique  (an  adaptation  of  a  technique  to  be  found  originally  in  [14,6,9])  permits  to  check  a 
batch  of  OT  quadruples  for  algebraic  correctness  using  a  smaller  number  of  “sacrificed”  quadruples 

343 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


than  the  basic  version  we  described  in  Section  6.  Recall,  the  idea  is  to  check  that  an  authenticated 
OT-quadruple  GaOTj  =  ([[ci]],  \zi\,  [[x*]],  [yi]])  verifies  the  “multiplicative”  relation  rm  =  Zi  -\-  Xi  -\- 
ei  ■  (xi  +  Vi)  =  0. 

At  a  high  level,  Protocol  9  essentially  consists  of  two  different  phases.  Let  (GaOTi, . . . ,  GaOT jv) 
be  a  set  of  OT  quadruples,  in  the  first  phase  a  fixed  portion  of  these  GaOTs  are  partially  opened 
as  in  a  classical  cut-and-choose  step.  If  any  of  the  opened  OT  quadruples  does  not  satisfy  the 
multiplicative  relation  the  protocol  aborts.  Otherwise  it  runs  the  second  phase:  the  remaining 
GaOTs  are  permuted  and  uniformly  distributed  into  t  buckets  of  size  T.  Then,  for  each  of  the 
buckets,  the  protocol  selects  a  BucketHead,  i.e.  the  first  (in  the  lex  order)  GaOT  in  the  bucket  (as  in 
[9]),  and  uses  the  remaining  GaOTs  in  the  same  bucket  to  check  that  BucketHead  correctly  satisfies 
the  multiplicative  relation.  If  any  BucketHead  does  not  pass  the  test,  then  we  know  that  some 
parties  are  corrupted  and  the  protocol  aborts.  If  all  the  checks  pass  then  we  obtain  t  algebraically 
correct  BucketHeads,  i.e.  t  OT  quadruples,  with  overwhelming  probability. 


Bucket  Cut-and-Choose  Protocol 

Input  :  Let  N  =  {T  +  h)  ■  t  he  the  number  of  input  GaOTs  and  T  the  size  of  the  buckets,  with  T  >  2.  We  let 
1  <  h  <  T  denote  an  additional  parameter  controlling  how  much  cut-and-choose  we  perform. 

Phase-I  Cut- And- Choose  : 

1.  Every  party  Pi  samples  a  seed  Si  and  asks  tFcomm  to  broadcast  n  =  Comm(si)- 

2.  Every  party  Pi  calls  Tcomm  with  Open(ri)  and  all  parties  obtain  Sj  for  all  j.  Set  s  =  si 

3.  Using  a  PRFf^’^,  parties  sample  a  random  vector  v  £  F^,  such  that  the  number  of  its  non-zero  entries 
is  h  -  t  (i.e.  the  Hamming  weight  of  v  is  ft  •  t). 

4.  Let  J'  be  the  set  of  indices  j  such  that  Vj  A  0,  and,  Vj  G  J,  the  parties  partially  open  GaOTj  and  check 
that  it  satisfies  the  algebraic  relation  Zj  -|-  Xj  =  Cj  ■  {xj  -\-  Vj).  If  there  exists  an  algebraically  incorrect 
GaOTj  quadruple,  then  the  protocol  aborts. 

Phase-II  Bucket- Sacrifice  : 

5.  Permute  the  unopened  GaOTs  according  to  a  random  permutation  tt  onT-t  indices,  again  using  a  PRFs. 
Then  renumber  the  permuted  unopened  GaOTj  ,  such  that  j  =  1, . . . ,  T  ■  t,  and,  for  i  =  1, . . . ,  t,  create 
the  ith  bucket  as  {GaOTj})Ljy_j.^]^. 

6.  Parties  compute  a  BucketHead(i)  for  each  i  =  1, . . .  ,t,  i.e.  return  the  first  (in  the  lex  order)  element  in 
the  iih  bucket. 

7.  For  i  =  1, . . .  ,t,  parties  check  that  BucketHead(i)  =  GaOT^  =  (|[ei|,  |[2i]|,  Jxil,  |yi|)  is  correct  using  the 
other  GaOTs  in  the  bucket.  For  j  =  iT  —  T  2, . . .  ,iT  do: 

-  Set  CheckGaOTj  =  GaOTj  =  (bil,  lejl,  fel). 

-  Parties  open  {a  -F  tj)  and  (xi  +  r/i  +  f)j  +  0j). 

—  Parties  locally  compute 

Ici.il  =  [zi  +  Xi\  -F  -F  f)jl  -F  (ci  -F  ej)|[f}j  -F  0jl  -F  {xi  -F  r/i  -F  +  0h)|[eil|, 
and  check  it  partially  opens  to  zero. 

—  If  all  checks  go  through  output  GaOT^  as  valid  quadruples;  otherwise  abort. 

8.  The  parties  execute  the  protocol  ilMACCheck  to  check  all  partially  opened  values. 

Protocol  9  Bucket  Cut-and-Choose  Protocol 


Theorem  3.  For  T  >  the  previous  protoeol  provide  t  eorreet  GaOTs  with  error  probability 

2“'". 
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Proof  (sketch).  It  is  easy  to  check  that  the  protocol  is  correct  and  secure  in  the  semi-honest  model, 
i.e.  if  all  the  OT  quadruples  are  honestly  generated,  according  to  the  GaOT  command  in  TIprep, 
then  Ci  =  0,Vi. 

The  argument  for  active  security  is  as  follows.  A  badGaOT,  i.e.  a  OT  quadruple  which  does  not 
satisfy  the  multiplicative  relation,  passes  the  test  if  and  only  if  all  the  partially  opened  GaOTs  in  the 
cut-and-choose  phase  are  correct  and  then  it  ends  up  in  a  bucket  containing  only  badGaOTs.  This  is 
because  if  we  combine  two  badGaOTs,  say  GaOTj  and  GaOTj,  we  obtain  =  rrii  +  mj  =  1  -|- 1  =  0, 
and  the  test  passes.  We  show  that  this  happens  with  negligible  probability  with  an  appropriate 
choice  of  the  parameters.  We  argue  this  in  two  steps:  first  we  prove  that  when  a  bucket  contains 
at  least  one  goodGaOT  (a  OT  satisfying  the  multiplicative  relation)  a  badGaOT  will  be  always 
detected,  and  then  we  bound  the  probability  of  having  buckets  containing  only  badGaOTs. 

If  parties  misbehaved  in  any  previous  step  yielding  a  badGaOTj,  when  we  combine  it  with  a 
goodGaOTj,  then  aj  =  rrii  +  mj  =  1  and  the  check  fails.  Notice  that  the  protocol  always  abort  if 
there  is  a  bucket  with  both  bad  and  good  GaOTs.  More  precisely  the  protocol  checks  the  algebraic 
correctness  of  the  BucketHeads,  but  indirectly  also  that  of  any  other  GaOTs  (We  use  the  BucketHead 
notation  so  that  each  GaOT  is  only  once  paired  with  a  different  GaOT). 

Let 

—  PassI Check  be  the  event  that  the  protocol  does  not  abort  in  the  cut-and-choose  step 

—  mbadGaOT  be  the  event  that  m  GaOTs  are  bad.  Note  that  we  fix  m  here. 

—  NoMixedBucket  the  event  that  there  are  no  buckets  containing  both  goodGaOTs  and  badGaOTs. 

We  bound  the  probability  that  both  PassICheck  and  NoMixedBucket  occur.  To  do  this  we  prove: 

1.  Pr[Lii]  =  Pr[PasslCheck  A  mbadGaOT]  < 

2.  Pr[.E2]  =  Pr[.Bi  A  NoMixedBucket]  < 

The  first  point  is  straightforward.  First  note  that  m  >  h  ■  t  then  Pr[PasslCheck]  =  0  and  the 
protocol  aborts;  similarly,  if  m  <  T,  then  Pr[NoMixedBucket]  =  0,  so  we  can  suppose  T  <  m  <  h  ■  t 
(in  particular  m  >  1).  Moreover  as  a  bad  BucketHead  will  be  always  detected  if  a  bucket  contains 
both  good  and  bad  GaOTs,  we  add  the  condition  m  =  k  ■  T,  k  =  1, . . .  ,t.  In  this  way  if  m  denotes 
the  number  of  badGaOTs,  and  PassICheck  is  true,  then  the  h  ■  t  GaOTs  that  are  opened  in  the 
cut-and-choose  step  are  sampled  from  the  N  —  m  good  GaOTs.  It  holds: 

fN-m\  f  N  _  f{T +  h) -t-wX  f{T  +  h)-ty^  _ 

\  h-t  j  \h-t)  \  T-t  —  m  )  \  T-t  ) 

{{T  +  h)-t-m)---{h-t  +  l)-{Tt)l  _  {T  ■t)---{Tt-m+l)  ■  {Tt-m)l  ^ 
{T  ■  t  +  h  ■  t)  ■  ■  ■  {ht  +  l){Tt  —  m)l  {Tt  +  ht)  ■  ■  ■  {Tt  +  ht  —  m  +  1)  ■  {Tt  —  m)\  ~ 
Tt  \  /  T  \ 

Tt  +  ht)  ^  \T  +  h)  ■ 

Now  we  compute  the  probability  of  NoMixedBucket  A  Ei.  Recall  that  the  cardinality  of  each  of  the 
t  buckets  is  T  and  that  we  are  assuming  m  =  k  -  T  bad  GaOTs.  It  is  easy  to  see  that 

(i \  ^ 
k)  '  [k-TJ 

is  symmetric  with  respect  to  the  value  k  =  t/2,  as  (^)  •  ,  A:  =  1, . . .  ,t. 
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and  it  strictly  decreases  for  1  <  A:  <  t/2;  the  term  is  less  than  1  and  it  decreases  when 

k  grows.  So  when  we  multiply  the  two  terms  we  have  that  the  above  probability  for  values  of  k  in 
[1, . . . , t/2]  is  bigger  than  the  same  probability  for  “symmetric”  values  in  ]t/2, . . .  ,t]  and  we  have 
the  maximum  for  k  =  1.  By  substituting  this  value  in  the  previous  expression  we  get: 


<  (  ^  Y  .  dl-T)  ^ 
“  \T  +  hJ 

<  2(l°g2(t))(l-T) 


Thus  for  T  >  we  obtain  Pr[iil2]  <  2 

□ 


We  can  replace  the  Sacrifice  OT  step  in  ilprep  with  the  above  Bucket-Cut-and-Choose  Protocol 
and  Theorem  2,  with  relative  proof,  still  holds. 

Notice,  how  the  value  h  has  little  effect  on  the  final  probability  (we  suppressed  the  effect  in 
the  statement  of  the  Lemma  since  it  is  so  low).  This  means  we  can  take  h  =  1  to  obtain  the  most 
efficient  protocol,  which  means  the  amount  of  cut-and-choose  performed  is  relatively  low. 

To  measure  the  efficiency  of  this  protocol  we  can  consider  the  ratio  r  =  =  T  +  h:  it 

measures  the  number  of  GaOTs  that  we  need  to  produce  one  actively  secure  OT  quadruple.  We 
obtain  the  following  table,  all  with  h  =  1  and  an  error  probability  of  2“^^. 


r 

T  =  r  —  h 

t 

40+log2(i) 

logjid) 

T 

3 

220 

3 

5 

4 

214 

3.85 

6 

5 

2io 

5 

B  Information  Theoretic  Tags  for  Dishonest  Majority 

In  the  online  phase,  parties  work  with  representations  with  information-theoretic  message  authen¬ 
tication  codes.  The  key  properties  of  the  MACs  is  that  are  homomorphic,  and  hold  enough  entropy 
to  convince  an  honest  party  that  local  computation  has  been  done  correctly.  The  homomorphic 
property  allows  us  to  postpone  the  check  of  the  correctness  in  the  MACs  until  the  very  end  of  the 
circuit  evaluation  (where  the  circuit  can  be  the  one  implicitly  used  in  the  preprocessing  or  the  tar¬ 
get  online  circuit).  In  [6]  it  was  shown  how  to  do  the  check  on  partially  open  values  whilst  keeping 
secret  the  key,  hence  enabling  support  for  reactive  online  evaluations,  and  this  is  the  one  we  use. 
See  Protocol  10  for  details.  The  procedure  utilizes  an  ideal  functionality  .Tcomm  for  commitments 
given  in  Figure  11.  An  implementation  of  .Tcomm  in  the  random  oracle  model  can  be  found  in  the 
Appendix  of  [6] . 

In  order  to  understand  the  probability  of  an  adversary  being  able  to  cheat  during  the  execution 
of  Protocol  10,  the  authors  in  [6]  used  a  security  game  approach,  which  in  turn  was  an  adaptation 
of  the  one  in  [7].  For  completeness,  we  state  here  both  the  protocol  and  the  security  game. 

The  adversary  wins  the  game  if  there  is  an  i  G  {1, . . . ,  t}  for  which  bi  ^  Oi,  and  the  check  goes 
through.  The  second  step  in  the  game,  where  the  adversary  sends  the  bfs,  models  the  fact  that 
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Protocol  -/iMACCheck 

Usage:  The  parties  have  a  set  of  lai],  sharings  and  public  bits  bi,  for  i  =  f, . . . ,  t,  and  they  wish  to  check  that 
tti  =  bi,  i.e.  they  want  to  check  whether  the  public  values  are  consistent  with  the  shared  MACs  held  by  the 
parties. 

As  input  the  system  has  sharings  ((a),  {&i,  (tti),  If  the  MAC  values  are  correct  then  we  have 

that  fi{ai)  =  bi  ■  a,  for  all  i. 

MACCheck({&i,...,&t}): 

1.  Every  party  Pi  samples  a  seed  Si  and  asks  Pcomm  to  broadcast  n  =  Comm(si). 

2.  Every  party  Pi  calls  l?Xomm  with  Open(ri)  and  all  parties  obtain  Sj  for  all  j. 

3.  Set  s  =  Si  +  •  •  •  + 

4.  Parties  sample  a  random  vector  x  ~  PRFs’*(0)  £  F*;  note  all  parties  obtain  the  same  vector  as  they 
have  agreed  on  the  seed  s. 

5.  Each  party  computes  the  public  value  b  =  Xi  '  ^  IF- 

6.  The  parties  locally  compute  the  sharings  (/r(a))  =  Xi  •  (/i(ai))  +  •  •  •+Xt  •  (fJ-iat))  and  (a)  =  {fi{a))  —  b-  (a). 

7.  Party  i  asks  Pcomm  to  broadcast  his  share  r/  =  Comm((Ti). 

8.  Every  party  calls  Pcomm  with  Open(ri),  and  all  parties  obtain  (jj  for  all  j. 

9.  If  (Ji  +  •  •  •  +  (T„  7^  0,  the  parties  output  0  and  abort,  otherwise  they  accept  all  bi  as  valid  authenticated 
bits. 


Protocol  10  Method  to  Check  MACs  on  Partially  Opened  Values 


Game:  Security  of  the  MACCheck  procedure  assuming  pseudorandom  functions 

1:  The  challenger  samples  random  sharing  (a)  £  F.  It  sets  {fi{ai))  =  ai-{a)  and  sends  bits  ai, . . . ,  at  to  the  adversary. 
2:  The  adversary  sends  back  bits  bi, ...  ,bt. 

3:  The  challenger  generates  random  values  Xi  >  •  •  • )  Xt  ^  s^nd  sends  them  to  the  adversary. 

4:  The  adversary  provides  an  error  Z\  £  F. 

5:  Set  b  =  Xi  '  a-nd  sharings  (/i(a))  =  Xi  '  and  (a)  =  —  b  ■  (a).  The  challenger  checks 

that  a  =  A. 


corrupted  parties  can  choose  to  lie  about  their  shares  of  values  opened  on  the  execution  of  the 
parent  protocol.  The  offset  A  models  the  fact  that  the  adversary  is  allowed  to  introduce  errors  on 
the  MACs.  A  formal  proof  of  Theorem  4  can  be  found  in  the  Appendix  of  [7,6]. 


Theorem  4  ([6]).  The  protocol  MACCheck  is  correct,  i.e.  it  accepts  if  all  the  public  values  bi,  and 
the  corresponding  MACs  are  correctly  computed.  Moreover,  it  is  sound,  i.e.  it  rejects  except  with 
probability  t|t  in  case  at  least  one  value,  or  MAC,  is  not  correctly  computed. 


The  Functionality  Pcomm 

Commit:  On  input  (Comm,  u,  i,  t«)  by  Pi  or  the  adversary  on  his  behalf  (if  Pi  is  corrupt),  where  v  is  either  in 
a  specific  domain  or  T,  it  stores  {v,i,Tv)  on  a  list  and  outputs  {i,TA  to  all  parties  and  adversary. 

Open:  On  input  (Open,i,r«)  by  Pi  or  the  adversary  on  his  behalf  (if  Pi  is  corrupt),  the  ideal  functionality 
outputs  (v,i,Tv)  to  all  parties  and  adversary.  If  (NoOpen,  i,  U;)  is  given  by  the  adversary,  and  Pi  is  corrupt, 
the  functionality  outputs  (T,i,r«)  to  all  parties. 

Figure  11  Ideal  Commitments 
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C  Security  Proofs 

C.l  Proof  of  the  Bootstrap  Step  (Lemma  1) 

We  show  that  an  environment  Z  corrupting  up  to  n  —  1  parties,  playing  with  ilBootstrap  attached  to 
•^aBit  or  with  the  simulator  S  attached  to  ^Bootstrap!  sees  transcripts  that  are  identically  distributed. 
We  assume  authenticated  communication  between  parties,  that  is,  they  are  given  access  to  a  func¬ 
tionality  .FaT)  which  on  input  (m,  s,  s')  from  Pg,  it  gives  message  m  to  Pg'  and  also  leak  it  to  Z.  In 
a  nutshell,  the  simulator  runs  a  copy  of  ilBootstrap  acting  on  behalf  of  honest  parties.  Let  A  be  the 
set  of  indices  of  corrupted  parties,  parties  in  A  are  indexed  with  j,  and  parties  not  in  A  with  k. 

We  start  describing  the  behaviour  of  S.  The  corruption  is  static,  so  we  can  distinguish  the  two 
cases: 

a)  Pi  is  honest. 

1.  In  step  1,  for  s  £  P,  S  engages  in  a  run  of  i72-Share(2-Share,  i,  s)  with  Z,  acting  on  behalf 
of  Pi  and  honest  Pk'.  It  sets  an  internal  copy  of  .TaBit  to  generate  representations  [rij]\,  on 

dummy  bits  It  answers  queries  from  Z  by  sending  him  {r'ij,  S  also  gives  random 

cjfc  to  Z^  for  k  ^  A,  and  gets  back  a*  for  j  £  A  (acting  as  Pat)-  It  then  sets  d'j. 

2.  S  sends  {S*}j^A  to  .^Bootstrap  as  part  of  Initialize. 

3.  In  step  3,  S  acting  as  Pat  gives  random  dg  to  Z,  Vs  /  i.  Note  that  n*  =  i^ij  +  dj  ■  5*  is  the 
purported  share  that  corrupt  Pj  should  come  up  with. 

4.  S  sends  {n*}j^A  to  J^Bootstrap- 

b)  Pi  is  dishonest  (Z  specifies  input  bit  x). 

1.  In  step  1,  for  s  £P,  S  engages  in  a  run  of  772-Share(2-Share,  i,  s)  with  Z,  acting  on  behalf  of 
honest  Pk-  It  sets  an  internal  copy  of  P^bm  to  generate  representations  [rij]\,  on  dummy  bits 

’  ^3 

Vi^g,  for  s  ^  i.  S  answers  queries  from  Z  by  sending  him  and  {r'ij,  d'jjgA- 

Acting  as  Pat-,  S  gives  random  ak  to  Z  and  it  gets  back  a*.  It  then  sets  corrupt  5*  =  a*  +  S'j. 
S  also  extracts  n*  =  Vij  -|-  (x  -|-  Vij)  -  S*,  for  j  £  A,  and  /r*  =  k'iA  and  v*  =  x  -  6*. 

2.  S  sends  {6*}j^A  to  .^Bootstrap  as  part  of  Initialize. 

3.  In  step  3,  S  gets  bits  d*  for  s  /  i  via  Pat,  and  for  each  k  ^  A  sets  the  flag  shift-Pk  to  true 
if  dl  /  n^k  +  X. 

4.  S  sends  {shift-Pk}fc^A,  hi,  x,  to  J^Bootstrap- 

Case  honest  Pi.  First,  we  show  that  ^Bootstrap  and  V^Bootstrap  output  identically  distributed  values 
if  Z  is  honest-but-curious.  In  ^Bootstrap)  the  parties  obtains  a  sharing  (d),  (i/),  and  party  Pi  provides 
input  bit  X  and  also  obtains  a  field  element  fj,-  Then,  we  have 

'^Vg  +  x-d  =[€  +  X  -  6i)  +  ['^{ni^s  +  dg  •  (Is))-hx  •  <5, 

sGV  s^i 

=  (e  -h  X  •  Si)  +  {'^{{iJ,i,s  +  •  ds)  +  (x  +  n^g)  -  (55))-hx  •  S, 

s^i 

=  [€  +  X  -  6i)  +  {  '^{lJ.i,s  +  X  •  (5s))-hx  •  5, 
s^i 

=  e  p  ^  ^  hi,s, 

=  h- 


348 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


For  what  Z  sees  during  the  execution,  either  or  dg,  leaked  by  J-aT:  look  random  since 
they  are  paddings  of  Sk  and  x,  with  fresh  pads  5^,  ri^s  given  by  ^aBit  to  Pi-  Now,  denote  by  5h 
the  sum  of  the  portion  of  5-shares  that  honest  parties  generated  in  Initialize  of  ilBootstrap;  and 
let  S\  =  +  Sj).  That  is,  should  match  the  sum  of  the  corrupt  portion  of  5-shares 

generated  in  Initialize.  Now,  say  Pi  inputs  bit  x  to  ilBootstrap,  then,  shares  {i'k}k^A  are  such  that 
=  Yhj^A  k'*j  +  x-  +  Sh)-  In  other  words,  honest  Vk  is  consistent  with  both,  5^  (that  the 
adversary  imposes  via  the  <7j’s)  and  t'*  (that  the  adversary  is  suppose  to  derive  from  the  bits  dj), 
and  these  shares  are  extracted  by  S  in  steps  1  and  3  respectively. 

Case  dishonest  Pi.  In  this  case,  S  sends  random  ak  to  Z  on  behalf  of  honest  Pk-  This  is 
indistinguishable  from  what  is  sent  in  a  real  run,  as  Pk  is  using  a  padding  given  by  .FaBit-  For  what 
I^Bootstrap  outputs  to  honest  parties,  we  note  again  that,  if  Z  gave  correct  d*^  to  S  using  .Tat,  the 
sum  of  the  honest  portion  of  i/-shares  is  equal  to  Yhj^A^^id  -|-  (x  -|-  rjj)  •  5*)  -|-  x  •  5^  -|- 
which  is  extracted  by  S  in  step  1.  And  if  Z  does  not  send  correct  namely  d\  =  x  +  Xi^k  +  1, 
it  would  cause  honest  Pk  to  compute  shifted  Uk  +  Sk,  which  is  exactly  what  S  tells  to  d^Bootstrap  to 
output  in  step  3.  □ 

C.2  Functionality  and  Proof  of  the  Online  Phase  (Theorem  1) 


Functionality  Toniine 

Initialize:  On  input  (init)  the  functionality  activates  and  waits  for  an  input  from  the  environment.  Then  it 
does  the  following:  if  it  receives  Abort,  it  waits  for  the  environment  to  input  a  set  of  corrupted  parties, 
outputs  it  to  the  parties,  and  aborts;  otherwise  it  continues. 

Input:  On  input  {input,  Pi,  varid,  x)  from  Pi  and  {input,  Pi,  varid,  ?)  from  all  other  parties,  with  varid  a  fresh 
identifier,  the  functionality  stores  {varid,  x). 

Add:  On  command  {add,  varidi,  varid2,  varids)  from  all  parties  (if  varidi,  varid2  are  present  in  memory  and 
vandz  is  not),  the  functionality  retrieves  {varidi, x),  {varid2,y)  and  stores  {varidi, x  +  y). 

Multiply:  On  input  {multiply,  varidi,  varid2,  varids)  from  all  parties  (if  varidi,  varid2  are  present  in  memory 
and  varidi  is  not),  the  functionality  retrieves  {varidi,  x),  {varidi,  y)  and  stores  {varidi,  x  ■  y). 

Output:  On  input  {output,  varid)  from  all  honest  parties  (if  varid  is  present  in  memory),  the  functionality 
retrieves  {varid, y)  and  outputs  it  to  the  environment.  The  functionality  waits  for  an  input  from  the  en¬ 
vironment.  If  this  input  is  Deliver  then  y  is  output  to  all  players.  Otherwise  it  outputs  0  is  output  to  all 
players. 

Figure  12  Secure  Function  Evaluation 


We  construct  a  simulator  S  such  that  an  environment  Z  corrupting  up  to  n  —  1  parties  cannot 
distinguish  whether  it  is  playing  with  the  Honiine  attached  with  Pprep  and  dxomm,  or  with  the 
simulator  S  and  T'oniine-  We  start  describing  the  behaviour  of  the  simulator  S: 

—  The  simulation  of  the  Initialize  procedure  is  performed  running  a  copy  of  Pprep  on  query  Init. 
All  the  data  of  the  corrupted  parties  are  known  to  the  simulator.  If  Z  inputs  Abort  to  the  copy 
of  Pprep,  then  the  simulator  does  the  same  to  .Tbniine  and  forward  the  output  of  .Tbniine  to  Z: 
If  -Toniine  outputs  Abort,  the  simulator  waits  for  input  a  set  of  corrupted  parties  from  Z  and 
forward  it  to  T'oniine,  and  aborts;  otherwise  it  uses  the  Z's  inputs  as  preprocessed  data. 

—  In  the  Input  stage  the  simulator  does  the  following.  For  the  honest  parties  this  step  is  run 
correctly  with  dummy  inputs;  it  reads  the  inputs  of  corrupted  parties  specified  by  Z.  Then 
the  simulator  runs  a  copy  of  Share  command  of  Pprep  sending  back  sharings  [x]q  such  that 
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i  G  A,  where  A  is  the  set  of  corrupted  parties.  When  Z  writes  the  outputs  corresponding  to  the 
corrupted  parties,  the  simulator  writes  these  values  on  the  influence  port  of  .^Online  as  inputs. 

—  The  procedure  Add,  Multiply  are  performed  according  to  the  protocol  and  the  simulator  calls 
the  respective  procedure  to  .^Online- 

—  In  the  Output  step,  the  functionality  .Tbniine  outputs  y  to  the  S.  Now  the  simulator  has  to 
provide  shares  of  honest  parties  such  that  they  are  consistent  with  y.  It  knows  an  output  value 
y'  computed  using  the  dummy  inputs  for  the  honest  parties,  so  it  can  select  a  random  honest 
player  and  modify  its  share  adding  y  —  y'  and  modify  the  MAC  adding  a{y  —  y'),  which  is 
possible  for  the  simulator,  since  it  knows  a.  After  that  the  simulator  opens  y  as  in  the  protocol. 
If  y  passes  the  check,  the  simulator  sends  Deliver  to  .Tbniine- 

All  the  steps  of  the  protocol  are  perfectly  simulated:  during  the  initialization  the  simulator  acts 
as  J-prep]  addition  does  not  involve  communication,  while  multiplication  implies  partial  opening:  in 
the  protocol,  as  well  as  in  the  simulation,  this  opening  reveals  uniform  values.  Also,  MACs  have 
the  same  distributions  in  both  the  protocol  and  the  simulation. 

Finally,  in  the  output  stage,  Z  can  see  y  and  the  shares  from  honest  parties,  which  are  uniform 
and  compatible  with  y  and  its  MAC.  Moreover  it  is  a  correct  evaluation  of  the  function  on  the  inputs 
provided  by  the  parties  in  the  input  stage.  The  same  happens  in  the  protocol  with  overwhelming 
probability,  since  the  probability  that  a  corrupted  party  is  able  to  cheat  in  a  MACCheck  call  is  2/|F| 
(see  Theorem  4).  □ 

C.3  Proof  of  the  Preprocessing  (Theorem  2) 

The  description  of  the  simulator,  denoted  by  5,  is  provided  in  Figure  13.  Define  T^^ai  to  be  the 
set  of  messages  sent  or  received  from  corrupt  parties  together  with  the  inputs  and  outputs  of  the 
parties,  in  an  execution  of  TIprep  with  .^Bootstrap  and  .Tcomm-  Likewise  define  Tjdeai  for  an  execution 
of  .T'prepwith  S.  To  prove  UC  security,  we  see  Z  as  a  distinguisher  between  the  two  systems,  and 
our  aim  is  to  show  that 

|Pr[0  ^  Z(TReai)]  -  Pr[0  ^  Z(TMeai)]  -  ^1  <  negl(K). 

For  this  to  hold,  it  is  enough  to  show  that  Z  receives  as  inputs  transcripts  TReah  '^ideai  that  are 
statistically  indistinguishable.  We  argue  as  follows. 

First  note  that  transcripts  generated  on  calls  to  Initialize  and  Share  in  both  executions,  are 
perfectly  indistinguishable,  as  they  are  nothing  but  calls  to  .^'Bootstrap  in  the  real  case,  with  identical 
behaviour  of  Share  command  in  tFprep,  (and  S  only  forwards  queries  to  the  .T'prep). 

We  turn  now  to  GaOT  command.  Let  OTout  =  {[el,  H,  [[xq]],  [xi]]}  be  the  quadruple  that 
honest  parties  are  hoping  to  output  if  no  abort  occurs.  Define  the  “multiplicative  relation”  m  = 
z  +  xq  +  e  ■  (xo  +  xi),  and  say  that  OTout  is  bad  if  m  =  1.  Thus,  bad  quadruples  are  those  that 
implement  the  multiplication  gate  incorrectly.  Additionally,  say  that  quadruple  is  noauth  if  Z  sent 
to  .T'Bootstrap(Q:)  flag  shift-Pk  set  to  true  for  at  least  one  honest  party  Pk,  during  the  execution  of 
AuthenticateOT. 

Indistinguishahility  of  transcripts.  First  notice  that  pReai  and  Tideai  truncated  up  to  the  point 
where  the  parties  output  the  quadruple  are  perfectly  indistinguishable  (steps  12  and  5  respectively): 
looking  at  Figure  13,  we  see  that  S  sacrifices  quadruples  exactly  as  ilprep-  More  precisely,  step  5 
of  S  mimics  steps  8-12  of  Ilprep-  Moreover,  S  uses  quadruples  generated  in  step  3,  and  honest 
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The  Simulator  of  TTprep 
The  set  of  corrupt  parties  is  denoted  with  A. 

Initialize:  S  forwards  to  J^prep  the  query  (Init)  together  with  {aj}jgA)  that  2  does  to  fFBootstrap-  Then  samples 
random  a  €  F,  and  a  set  of  sharings  {ak}k^A  consistent  with  {aj}jgA  and  a,  but  otherwise  random.  It 
stores  the  complete  sharing  for  later  use. 

Share:  S  forwards  to  fFp^p  the  query  (i,  Share)  of  Z  to  fFBootstrap.  S  also  gets  flags  {shift-Pkjfe^A,  and  MAC 
shares  {pjjjgA  from  Z.  If  i  £  A,  Z  specihes  input  bit  x.  S  sends  shift  flags,  MAC  shares  and  (possibly) 
input  X  to  fFprep- 

GaOT: 

1.  In  steps  1  and  3,  when  Z  thinks  is  querying  fAsootstrap,  on  commands  Init  and  Share,  respectively,  S 
discards  all  the  values  received  from  Z. 

2.  Steps  2,  4,  5  are  local,  and  S  does  nothing. 

3.  Steps  6-7,  are  repeated  four  times,  one  for  each  symbol  y  €  {e,z,xo,X\}.  In  each  invocation  S  does: 

—  During  the  i-th  query  to  Share  command  of  fpBootstrap,  S  receives  from  Z  bits  and  MAC 

shares  {i^j(j/i)  £  Fj^gA,  and  flags  {shift-Pk^'^jfe^A-  It  also  receives  bit  yi,  and  /r(yi)  £  F,  if  i  G  A. 

—  After  the  n  queries  are  done,  S  sets  the  data  of  each  representation  [yiYa  corresponding  to  honest 
parties  exactly  as  fFBootstrap  would  do.  Thus,  if  i  ^  A,  S  samples  yt  £  F2,  and  /i(j/i)  £  F  at  random, 
otherwise  uses  Z’s  choice.  It  sets  i^{yi)  =  y{yi)  +  yt  ■  a  and  prepares  sharings  (yi)  {v(yi))  where  the 
honest  shares  Vk{yi)  are  consistent  with  Z's  shares.  Finally,  S  shifts  honest  share  Vkiyi)  =  i'k{yi)+ak 
if  shift-(Pfc)^*^  is  true.  The  honest  data  on  the  joint  representation  Jy]  is  generated  as  one  expects, 
where  y  =  J^ievVi- 

4.  The  above  steps  are  repeated  at  least  «:  -P  1  times,  as  in  iTprep. 

5.  Steps  8-12  are  performed  as  in  ilprep,  where  5  acts  on  behalf  of  honest  parties  using  the  dummy  quadru¬ 
ples  generated  in  the  executions  of  step  3.  It  also  answers  queries  from  Z  to  Comm  and  Open  commands 
of  J-Qomm-  Openings  on  behalf  of  honest  parties  are  set  to  random  seed  values. 

6.  If  some  iteration  in  the  previous  step  result  in  abort,  S  inputs  Abort  to  J-p,sp-  Otherwise,  inputs  Continue, 
and  for  each  bit  y  £  {e,  z,  xo,  xi}  of  the  checked  quadruple,  S  discards  the  shift  flags,  and  gives  bit  shares 
{yjjjeA,  and  MAC  shares  {yj(y)}jeA  derived  in  step  3,  to  iFprep- 

Figure  13  The  Simulator  of  TTprep 


parties  use  quadruples  generated  in  steps  6-7.  These  quadruples  are  identically  distributed  because 
S  proceeds  exactly  as  ./^Bootstrap  does.  Also,  notice  that  in  ilprep  the  output  quadruples  are  those  that 
parties  choose  to  authenticate,  and  hence  S  skips  the  simulation  of  ShareOT  (besides  accepting 
Z’s  queries)  since  no  outgoing  communication  from  either  ./^Bootstrap  or  party-to-party  is  done. 

Output  indistinguishahility.  If  Z  is  honest-but-curious,  then  a  run  with  ilprep  outputs  a  quadru¬ 
ple  that  is  neither  bad  nor  noauth.  This  follows  from  the  correctness  of  ShareOT  and  Authenti- 
cateOT  steps.  Also,  in  step  3,  S  is  able  to  extract  the  portion  of  shares  of  OTout  corresponding  to 
corrupt  parties,  and  give  them  to  .T'prep-  We  therefore  conclude  that  the  outputs  in  both  worlds  are 
identically  distributed.  On  the  other  hand,  if  Z  misbehaves  in  an  arbitrary  way,  it  suffices  to  show 
the  following  to  conclude  the  proof: 

OTout  is  bad  V  noauth  ilprep  outputs  0  with  probability  1  —  negl(K). 

We  argue  as  follows:  the  sacrifice  step  is  run  by  the  honest  parties.  Therein,  in  the  ith  iteration, 
a  fresh  check  quadruple  OTj  is  taken  and  honest  parties  reveal  a  linear  combination  on  their  portion 
of  the  shares  of  OTout  and  OTj,  that  open  to  p*,  qi  and  c*.  If  Z  started  with  input  shares  that  render 
an  OTout  that  is  noauth,  or  chooses  to  reveal  something  different,  say  wlog,  the  first  opening  gives 
wrong  p*  Then  he  managed  to  either  pass  HiviACCheck  on  the  open  values  with  p*  not  authenticated, 
or  he  managed  to  authenticate  p*  and  feed  it  to  HiviACCheck-  The  former  happens  with  probability 
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by  Theorem  4  (assuming  PRFf’*(-)),  and  the  latter  is  equivalent  to  have  Z  holding  the  field  element 
IJ'H  +Pi  ■  otH  =  Y2k(^Ail~‘-k{Pi)  +Pi  •  0(k),  and  this  happens  with  probability  since  pn  +Pi  ■  oih  is 
only  derivable  from  the  private  transcripts  of  honest  parties  (thus,  Z  must  guess  it).  We  conclude 
that,  if  I^MACCheck  passes,  then  Z  misbehaves  in  the  sacrifice  step,  or  it  inputs  shares  that  render 
an  OTout  that  is  noauth,  with  probability  bounded  by  ||^  =  2“''+^.  Now,  it  is  easy  to  see  that  if  Z 
follows  the  sacrifice  step,  then  we  can  write  q  =  where  is  the  multiplicative  relation 

of  OTj.  Therefore,  if  Z  misbehaved  in  any  previous  step,  yielding  bad  OTout,  then  c*  =  +  m'. 

In  this  way  if  the  sacrifice  step  passes,  we  can  write  t  =  m^,  where  t  is  the  challenge  vector.  This 
vector  is  randomly  sampled  from  F^,  assuming  PRF^2’''(-),  thus  the  probability  of  having  t  fixed  to 
m'  is  2“'". 

Summing  up,  bad  or  noauth  output  quadruples  will  pass  both  tests  with  probability  at  most 
2-k+i_  This  concludes  the  proof  of  the  theorem.  □ 
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Abstract.  We  present  a  secure  honest  majority  MFC  protocol,  against  a  static  adversary,  which  aims  to  reduce  the 
communication  cost  in  the  situation  where  there  are  a  large  number  of  parties  and  the  number  of  adversarially  con¬ 
trolled  parties  is  relatively  small.  Our  goal  is  to  reduce  the  usage  of  point-to-point  channels  among  the  parties,  thus 
enabling  them  to  run  multiple  different  protocol  executions.  Our  protocol  has  highly  efficient  theoretical  communi¬ 
cation  cost  when  compared  with  other  protocols  in  the  literature;  specifically  the  circuit-dependent  communication 
cost,  for  circuits  of  suitably  large  depth,  is  (!?(|ckt|K^),  for  security  parameter  k  and  circuit  size  |ckt|.  Our  protocol 
finds  application  in  cloud  computing  scenario,  where  the  fraction  of  corrupted  parties  is  relatively  small.  By  mini¬ 
mizing  the  usage  of  point-to-point  channels,  our  protocol  can  enable  a  cloud  service  provider  to  run  multiple  MFC 
protocols. 


1  Introduction 

Threshold  secure  multi-party  computation  (MPC)  is  a  fundamental  problem  in  secure  distributed  computing. 
It  allows  a  set  of  n  mutually  distrusting  parties  with  private  inputs  to  “securely”  compute  any  publicly  known 
function  of  their  private  inputs,  even  in  the  presence  of  a  centralized  adversary  who  can  control  any  t  out  of 
the  n  parties  and  force  them  to  behave  in  any  arbitrary  manner.  Now  consider  a  situation,  where  n  is  very 
large,  say  n  >  1000  and  the  proportion  of  corrupted  parties  (namely  the  ratio  t/n)  is  relatively  small,  say  5 
percent.  In  such  a  scenario,  involving  all  the  n  parties  to  perform  an  MPC  calculation  is  wasteful,  as  typical 
(secret-sharing  based)  MPC  protocols  require  all  parties  to  simultaneously  transmit  data  to  all  other  parties. 
However,  restricting  to  a  small  subset  of  parties  may  lead  to  security  problems.  In  this  paper  we  consider 
the  above  scenario  and  show  how  one  can  obtain  a  communication  efficient,  robust  MPC  protocol  which 
is  actively  secure  against  a  computationally  bounded  static  adversary.  In  particular  we  present  a  protocol  in 
which  the  main  computation  is  performed  by  a  “smallish”  subset  of  the  parties,  with  the  whole  set  of  parties 
used  occasionally  so  as  to  “checkpoint”  the  computation.  By  not  utilizing  the  entire  set  of  parties  all  the  time 
enables  them  to  run  many  MPC  calculations  at  once.  The  main  result  we  obtain  in  the  paper  is  as  follows: 

Main  Result  (Informal):  Let  ^  ^  with  0  <  e  <  1/2  and  let  the  t  corrupted  parties  be  under 

the  control  of  a  computationally  bounded  static  adversary.  Then  for  a  security  parameter  k  (for 
example  k  =  80  or  k  =  128),  there  exists  an  MPC  protocol  with  the  following  circuit-dependent 
communication  complexity'^  to  evaluate  an  arithmetic  circuit  ckt:  (a).  0(|ckt|  •  k^)  for  ckt  with  depth 
(b).  0(|ckt|  •  k"^)  for  ckt  with  d  =  uj{t)  and  w  =  uj{k^)  (i.e.  |ckt|  =  tu(K^f)). 

Protocol  Overview:  We  make  use  of  two  secret-sharing  schemes.  A  secret-sharing  scheme  [•]  which  is  an 
actively-secure  variant  of  the  Shamir  secret-sharing  scheme  [28]  with  threshold  t.  This  first  secret-sharing 

The  communication  complexity  of  an  MFC  protocol  has  two  parts:  a  circuit-dependent  part,  dependent  on  the  circuit  size  and  a 
circuit-independent  part.  The  focus  is  on  the  circuit-dependent  communication,  based  on  the  assumption  that  the  circuit  is  large 
enough  so  that  the  terms  independent  of  the  circuit-size  can  be  ignored;  see  for  example  [11,4, 12, 5]. 
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scheme  is  used  to  share  values  amongst  all  of  the  n  parties.  The  second  secret-sharing  scheme  (•)  is  an 
actively-secure  variant  of  an  additive  secret-sharing  scheme,  amongst  a  well-defined  subset  C  of  the  parties. 

Assuming  the  inputs  to  the  protocol  are  [•]  shared  amongst  the  parties  at  the  start  of  the  protocol,  we 
proceed  as  follows.  We  first  divide  ckt  into  L  levels,  where  each  level  consists  of  a  sub-circuit.  The  com¬ 
putation  now  proceeds  in  L  phases;  we  describe  phase  i.  At  the  start  of  phase  i  we  have  that  all  n  parties 
hold  [•]  sharings  of  the  inputs  to  level  i.  The  n  parties  then  select  (at  random)  a  committee  C  of  size  c.  If  c  is 
such  that  e'^  <  2~'^  then  statistically  the  committee  C  will  contain  at  least  one  honest  party,  as  the  inequality 
implies  that  the  probability  that  the  committee  contains  no  honest  party  is  negligibly  small.  The  n  parties 
then  engage  in  a  “conversion”  protocol  so  that  the  input  values  to  level  i  are  now  (•)  shared  amongst  the 
committee.  The  committee  C  then  engages  in  an  actively-secure  dishonest  majority^  MPC  protocol  to  eval¬ 
uate  the  sub-circuit  at  level  i.  If  no  abort  occurs  during  the  evaluation  of  the  fth  sub-circuit  then  the  parties 
engage  in  another  “conversion”  protocol  so  that  the  output  values  of  the  sub-circuit  are  converted  from  a  (•) 
sharing  amongst  members  in  C  to  a  [•]  sharing  amongst  all  n  parties.  This  step  amounts  to  check-pointing 
data.  This  ensures  that  the  inputs  to  all  the  subsequent  sub-circuits  are  saved  in  the  form  of  [•]  sharing  which 
guarantees  recoverability  as  long  as  0  <  e  <  So  the  check-pointing  prevents  from  re-evaluating  the  entire 
circuit  from  scratch  after  every  abort  of  the  dishonest-majority  MPC  protocol. 

If  however  an  abort  occurs  while  evaluating  the  fth  sub-circuit  then  we  determine  a  pair  of  parties  from 
the  committee  C,  one  of  whom  is  guaranteed  to  be  corrupted  and  eliminate  the  pair  from  the  set  of  active 
parties,  and  re-evaluate  the  sub-circuit  again.  In  fact,  cheating  can  also  occur  in  the  (•)  [•]  conversions 

and  we  need  to  deal  with  these  as  well.  Thus  if  errors  are  detected  we  need  to  repeat  the  evaluation  of  the 
sub-circuit  at  level  f .  Since  there  are  at  most  t  bad  parties,  the  total  amount  of  backtracking  (i.e.  evaluating 
a  sub-circuit  already  computed)  that  needs  to  be  done  is  bounded  by  t.  For  large  n  and  small  t  this  provides 
an  asymptotically  efficient  protocol. 

The  main  technical  difficulty  is  in  providing  actively-secure  conversions  between  the  two  secret-sharing 
schemes,  and  providing  a  suitable  party-elimination  strategy  for  the  dishonest  majority  MPC  protocol.  The 
party-elimination  strategy  we  employ  follows  from  standard  techniques,  as  long  as  we  can  identify  the  pair 
of  parties.  This  requirement,  of  a  dishonest-majority  MPC  protocol  which  enables  identification  of  cheaters, 
without  sacrificing  privacy,  leads  us  fo  fhe  utilization  of  fhe  profocol  in  [12].  This  resulfs  in  us  needing  fo 
use  double-frapdoor  homomorphic  commilmenfs  as  a  basic  building  block.  To  ensure  greafer  asymplofic 
efficiency  we  apply  fwo  techniques:  (a),  fhe  check-pointing  is  done  among  a  sef  of  parlies  lhal  assures 
honesl  majorily  wilh  overwhelming  probabilify  (b).  fhe  packing  fechnique  from  [20]  fo  our  Shamir  based 
secrel  sharing. 

To  oblain  an  efficienl  profocol  one  needs  fo  selecf  L;  if  L  is  loo  small  Ihen  fhe  sub-circuils  are  large  and 
so  fhe  cosl  of  reluming  fo  a  prior  checkpoinl  will  also  be  large.  If  however  L  is  too  large  Ihen  we  will  need 
to  checkpoinl  a  lol,  and  hence  involve  all  n  parties  in  fhe  compulalion  al  a  lof  of  slages  (and  Ihus  requiring 
all  n  parlies  to  be  communicaling/compuling).  The  optimal  value  of  L  for  our  protocol  lurns  oul  to  be  t. 

Related  Work:  The  circuil-dependenl  communication  complexity  of  the  traditional  MPC  protocols  in  the 
honest-majority  setting  is  0(|ckt|  •  Poly(n,  k));  this  informally  stems  from  the  fact  in  these  protocols  we 
require  all  the  n  parties  to  communicate  with  each  other  for  evaluating  each  gate  of  the  circuit.  Assum¬ 
ing  0  <  e  <  1/2,  [11]  presents  a  computationally  secure  MPC  protocol  with  communication  complexity 
0(|ckt|  •  Poly(K,logn,log  |ckt|)).  The  efficiency  comes  from  fhe  ability  to  pack  and  share  several  values 
simulfaneously  which  in  lurn  allow  parallel  evaluation  of  “several”  gales  simulfaneously  in  a  single  round 

^  In  the  dishonest-majority  setting,  the  adversary  may  corrupt  all  but  one  parties.  An  MPC  protocol  in  this  setting  aborts  if  a 
corrupted  party  misbehaves. 
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of  communication.  However,  the  protocol  still  requires  communications  between  all  the  parties  during  each 
round  of  communication.  Our  protocol  reduces  the  need  for  the  parties  to  be  communicating  with  all  oth¬ 
ers  at  all  stages  in  the  protocol;  moreover,  asymptotically  for  large  n  it  provides  a  better  communication 
complexity  over  [11]  (as  there  is  no  dependence  on  n),  for  circuits  of  suitably  large  depth  as  stated  earlier. 
However,  the  protocol  of  [1 1]  is  secure  against  a  more  powerful  adaptive  adversary. 

In  the  literature,  another  line  of  investigation  has  been  carried  out  in  [6, 10, 14, 15]  to  beat  the  C)(|ckt|  • 
Poly(n,  k))  communication  complexity  bound  of  traditional  MPC  protocols,  against  a  static  adversary.  The 
main  idea  behind  all  these  works  is  similar  to  ours,  which  is  to  involve  “small  committees”  of  parties  for 
evaluating  each  gate  of  the  circuit,  rather  than  involving  all  the  n  parties.  The  communication  complexity 
of  these  protocols^  is  of  the  order  C)(|ckt|  •  Poly(log  n,K)).  Technically  our  protocol  is  different  from  these 
protocols  in  the  following  ways:  (a).  The  committees  in  [6, 10, 14, 15]  are  of  size  Poly(log  n),  which  ensures 
that  with  high  probability  the  selected  committees  have  honest  majority.  As  a  result,  these  protocols  run  any 
existing  honest-majority  MPC  protocol  among  these  small  committees  of  Poly(log  n)  size,  which  prevents 
the  need  to  check-point  the  computation  (as  there  will  be  no  aborts).  On  the  other  hand,  we  only  require 
committees  with  at  least  one  honest  party  and  our  committee  size  is  independent  of  n,  thus  providing  better 
communication  complexity.  Indeed,  asymptotically  for  large  n,  our  protocol  provides  a  better  communica¬ 
tion  complexity  over  [6, 10, 14, 15]  (as  there  is  no  dependence  on  n),  for  circuits  of  suitably  large  depth,  (b). 
Our  protocol  provides  a  better  fault-tolerance.  Specifically,  [14, 10,6]  requires  e  <  1/3  and  [15]  requires 
e  <  1/8;  on  the  other  hand  we  require  e  <  1/2. 

We  stress  that  the  committee  selection  protocol  in  [6, 10, 14, 15]  is  unconditionally  secure  and  in  the 
full-information  model,  where  the  corrupted  parties  can  see  all  the  messages  communicated  between  the 
honest  parties.  On  the  other  hand  our  implementation  of  the  committee  selection  protocol  is  computationally 
secure.  The  committee  election  protocol  in  [6, 10, 14, 15]  is  inherited  from  [17].  The  committee  selection 
protocol  in  these  protocols  are  rather  involved  and  not  based  on  simply  randomly  selecting  a  subset  of 
parties,  possibly  due  to  the  challenges  posed  in  the  full  information  model  with  unconditional  security;  this 
causes  their  committee  size  to  be  logarithmic  in  n.  However,  if  one  is  willing  to  relax  at  least  one  of  the 
above  two  features  (i.e.  full  information  model  and  unconditional  security),  then  it  may  be  possible  to  select 
committees  with  honest  majority  in  a  simple  way  by  randomly  selecting  committees,  where  the  committee 
size  may  be  independent  of  n.  However  investigating  the  same  is  out  of  the  scope  of  this  paper. 

Finally  we  note  that  the  idea  of  using  small  committees  has  been  used  earlier  in  the  literature  for  var¬ 
ious  distributed  computing  tasks,  such  as  the  leader  election  [23,26],  Byzantine  agreement  [24,25]  and 
distributed  key-generation  [9]. 

On  the  Choice  of  e:  We  select  committees  of  size  c  satisfying  e'^  <  2~'^.  This  implies  that  the  selected 
committee  has  at  least  one  honest  participant  with  overwhelming  probability.  We  note  that  it  is  possible  to 
randomly  select  committees  of  “larger”  size  so  that  with  overwhelming  probability  the  selected  committee 
will  have  honest  majority.  We  label  the  protocol  which  samples  a  committee  with  honest  majority  and  then 
runs  an  computationally  secure  honest  majority  MPC  protocol  (where  we  need  not  have  to  worry  about 
aborts)  as  the  “naive  protocol”.  The  naive  protocol  will  have  communication  complexity  C)(|ckt|  •  Poly(K)). 

For  “very  small”  values  of  e,  the  committee  size  for  the  naive  protocol  is  comparable  to  the  committee 
size  in  our  protocol.  We  demonstrate  this  with  an  example,  with  n  =  1000  and  security  level  k  =  80:  The 
committee  size  we  require  to  ensure  both  a  single  honest  party  in  the  committee  and  a  committee  with  honest 
majority,  with  overwhelming  probability  of  (1  —  2“®°)  for  various  choices  of  e,  is  given  in  the  following 
table: 

®  Note,  the  protocol  of  [6]  involves  FHE  to  further  achieve  a  communication  complexity  of  (P(Poly(log  n)). 
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e 

c  to  oblain  al  leasl  one  honesl  parly 

c  to  oblain  honesl  majority 

1/3 

48 

448 

1/4 

39 

250 

1/10 

23 

84 

1/100 

11 

20 

From  the  table  it  is  elear  that  when  e  is  eloser  to  1/2,  the  differenee  in  the  eommittee  size  to  obtain  at 
least  one  honest  party  and  to  obtain  honest  majority  is  large.  As  a  result,  seleeting  eommittees  with  honest 
majority  ean  be  prohibitively  expensive,  thus  our  seleetion  of  small  eommittees  with  dishonest  majority 
provides  signifieant  improvements. 

To  see  intuitively  why  our  protoeol  seleets  smaller  eommittees,  eonsider  the  ease  when  the  seeurity 
parameter  k  tends  to  infinity:  Our  protoeol  will  require  a  eommittee  of  size  roughly  e  •  n  +  1,  whereas  the 
naive  protoeol  will  require  a  eommittee  of  size  roughly  2  •  e  •  n  +  1.  Thus  the  naive  method  will  use  a 
eommittee  size  of  roughly  twiee  that  of  our  method.  Henee,  if  small  eommittees  are  what  is  required  then 
our  method  improves  on  the  naive  method. 

For  fixed  e  and  inereasing  n,  we  ean  apply  fhe  binomial  approximation  fo  fhe  hypergeomefrie  disfri- 
bufion,  and  see  fhaf  our  protoeol  will  require  a  eommillee  of  size  c  K/log2(^).  To  estimate  fhe  eom- 
miflee  size  for  fhe  naive  protoeol  we  use  fhe  eumulafive  disfribufion  funelion  for  fhe  binomial  disfribufion, 
F{b;  c,  e),  whieh  gives  fhe  probabilify  fhaf  we  seleef  af  teas!  b  eorrupf  parties  in  a  eommillee  of  size  c  given 
fhe  probabilify  of  a  eorrupf  parly  being  fixed  al  e.  To  oblain  an  honesl  majorify  wilh  probabilify  less  lhan 
2“'^  we  require  F(c/2;  c,  e)  2~'^.  By  estimating  F{cl2-,  c,  e)  via  Hoeffding’s  inequalily  we  oblain 

(c-e-c/2)2' 


exp 


-2 


whieh  implies 


K 


/loge2. 


Solving  for  c  gives  us 

2  ■  K  ■  logg  2 
(2-6-1)2- 

Thus  for  fixed  e  and  large  n  fhe  number  of  parties  in  a  eommillee  is  0{k)  for  bolh  our  protoeol,  and  fhe 
naive  protoeol.  Thus  fhe  eommuniealion  eomplexily  of  our  proloeol  and  fhe  naive  protoeol  is  asymplolieally 
fhe  same.  Bui,  sinee  fhe  eommittees  in  our  proloeol  are  always  smaller  lhan  Ihose  in  Ihe  naive  protoeol,  we 
will  oblain  an  advanlage  when  Ihe  ratio  of  Ihe  differenl  eommittee  size  is  large,  i.e.  when  e  is  larger. 

The  Ihe  ratio  belween  Ihe  eommittee  size  in  Ihe  naive  protoeol  and  lhal  of  our  protoeol  (assuming  we 
are  in  a  range  when  Hoeffding’s  inequalily  provides  a  good  approximation)  is  roughly 


-2  •  logg  2  •  log2  e 
(2.e-l)2 


So  for  large  n  Ihe  ratio  belween  Ihe  eommittee  sizes  of  Ihe  Iwo  proloeols  depends  on  e  alone  (and  is  inde- 
pendenl  of  k).  By  way  of  example  Ibis  ratio  is  approximately  equal  to  159  when  e  =  0.45, 19  when  e  =  1/3, 
7  when  e  =  1/10  and  9.6  when  e  =  1/100;  allhough  Ihe  approximation  via  Hoeffding’s  inequalily  only 
really  applies  for  e  elose  to  1/2. 

This  implies  lhal  for  values  of  e  elose  to  1  /2  our  protoeol  will  be  an  improvemenl  on  Ihe  naive  protoeol. 
However,  Ihe  naive  melhod  does  nol  have  Ihe  exlra  eosl  of  eheek-poinling  whieh  our  melhod  does;  Ihus  al 
some  poinl  Ihe  naive  protoeol  will  be  more  eflieienl.  Thus  our  protoeol  is  perhaps  more  interesting,  when  e 
is  nol  too  small,  say  in  Ihe  range  of  [1/100, 1/2]. 
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Possible  Application  of  Our  Protocol  for  Cloud- Computing.  Consider  the  situation  of  an  organization 
performing  a  multi-party  computation  on  a  cloud  infrastructure,  which  involves  a  large  number  of  machines, 
with  the  number  of  corrupted  parties  possibly  high,  but  not  exceeding  one  half  of  the  parties,  (which  is 
exactly  the  situation  considered  in  our  MPC  protocol).  Using  our  MPC  protocol,  the  whole  computation  can 
be  then  carried  out  by  a  small  subset  of  machines,  with  the  whole  cloud  infrastructure  being  used  only  for 
check-pointing  the  computation.  By  not  utilizing  the  whole  cloud  infrastructure  all  the  time,  we  enable  the 
cloud  provider  to  serve  multiple  MPC  requests. 

Our  protocol  is  not  adaptively  secure.  In  fact,  vulnerability  to  adaptive  adversary  is  inherent  to  most 
of  the  committee -based  protocols  for  several  distributed  computing  tasks  such  as  Leader  Election  [23,26], 
Byzantine  Agreement  [25,24],  Distributed  Key-generation  [9]  and  MPC  in  [14, 10].  Furthermore,  We  feel 
that  adaptive  security  is  not  required  in  the  cloud  scenario.  Any  external  attacker  to  the  cloud  data  centre  will 
have  a  problem  determining  which  computers  are  being  used  in  the  committee,  and  an  even  greater  problem 
in  compromising  them  adaptively.  The  main  threat  model  in  such  a  situation  is  via  co-tenants  (other  users 
processes)  to  be  resident  on  the  same  physical  machine.  Since  the  precise  machine  upon  which  a  cloud 
tenant  sits  is  (essentially)  randomly  assigned,  it  is  hard  for  a  co-tenant  adversary  to  mount  a  cross- Virtual 
Machine  attack  on  a  specific  machine  unless  they  are  randomly  assigned  this  machine  by  the  cloud.  Note, 
that  co-tenants  have  more  adversarial  power  than  a  completely  external  attacker.  A  more  correct  security 
model  would  be  to  have  a  form  of  adaptive  security  in  which  attackers  pro-actively  move  from  one  machine 
to  another,  but  in  a  random  fashion.  We  leave  analysing  this  complex  situation  to  a  future  work. 

2  Model,  Notation  and  Preliminaries 

We  denote  by  "P  =  {Pi, . . . ,  the  set  of  n  parties  who  are  connected  by  pair-wise  private  and  authentic 
channels.  We  assume  that  there  exists  a  PPT  static  adversary  A,  who  can  maliciously  corrupt  any  t  parties 
from  V  at  the  beginning  of  the  execution  of  a  protocol,  where  t  =  n-e  and  0  <  e  <  ^ .  There  exists  a  publicly 
known  randomized  function  /  :  F”  ^  ¥p,  expressed  as  a  publicly  known  arithmetic  circuit  ckt  over  the  field 
¥p  of  prime  order  p  (including  random  gales  lo  enable  Ihe  evalualion  of  randomized  funclions),  wilh  parly  Pi 
having  a  private  inpul  e  ¥p  for  Ihe  compulalion.  We  lei  d  and  w  lo  denote  Ihe  deplh  and  (average)  widlh 
of  ckt  respeclively.  The  finite  field  Fp  is  assumed  lo  be  such  lhal  p  is  a  prime,  wilh  p  >  maxjn,  2'^},  where 
K,  is  Ihe  computational  security  parameter.  Aparl  from  n,  we  also  have  an  addilional  statistical  security 
parameter  s  and  Ihe  securily  offered  by  s  (which  is  generally  much  smaller  lhan  n)  does  nol  depend  on  Ihe 
compulalional  power  of  Ihe  adversary. 

The  securily  of  our  prolocol(s)  will  be  proved  in  Ihe  universal  composabilily  (UC)  model.  The  UC  frame¬ 
work  allows  for  defining  Ihe  securily  properties  of  cryptographic  lasks  so  lhal  securily  is  mainlained  under 
general  composition  wilh  an  unbounded  number  of  inslances  of  arbilrary  protocols  running  concurrenlly.  In 
Ihe  framework,  Ihe  securily  requiremenls  of  a  given  lask  are  caplured  by  specifying  an  ideal  funclionalily 
run  by  a  “Irusled  parly”  lhal  oblains  Ihe  inpuls  of  Ihe  parties  and  provides  Ihem  wilh  Ihe  desired  oulpuls. 
Informally,  a  protocol  securely  carries  oul  a  given  lask  if  running  Ihe  protocol  in  Ihe  presence  of  a  real-world 
adversary  amounls  to  “emulating”  Ihe  desired  funclionalily.  For  more  delails,  see  Appendix  A. 

We  do  nol  assume  a  physical  broadcasl  channel.  Allhough  our  protocol  uses  an  ideal  broadcasl  func- 
lionalily  (Fig.  3),  lhal  allows  a  sender  Sen  E  "P  to  reliably  broadcasl  a  message  to  a  group  of  parties 
A  C  P,  the  functionality  can  be  instantiated  using  point-to-point  channels;  see  Appendix  B.2  for  details. 

The  communication  complexity  of  our  protocols  has  two  parts:  the  communication  done  over  the  point- 
to-point  channels  and  the  broadcast  communication.  The  later  is  captured  by  BC(£,  \  X\)  to  denote  that  in 
total,  0{l)  bits  is  broadcasted  in  the  associated  protocol  to  a  set  of  parties  of  size  \X\.  For  details  about  the 
instantiation  of  Pbc^  see  Appendix  B. 
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Two  different  types  of  seeret-sharing  are  employed  in  our  protoeols.  The  seeret-sharings  are  inherently 
defined  to  inelude  “verifieation  information”  of  the  individual  shares  in  the  form  of  publiely  known  eommit- 
ments.  We  use  a  variant  of  the  Pedersen  homomorphie  eommitment  scheme  [27].  In  our  protocol,  we  require 
UC-secure  commitments  to  ensure  that  a  committer  must  know  its  committed  value  and  just  cannot  manip¬ 
ulate  a  commitment  produced  by  other  committers  to  violate  what  we  call  “input  independence”.  It  has  been 
shown  in  [8]  that  a  UC  secure  commitment  scheme  is  impossible  to  achieve  without  setup  assumptions. 
The  standard  method  to  implement  UC-secure  commitments  is  in  the  Common  Reference  String  (CRS) 
model  where  it  is  assumed  that  the  parties  are  provided  with  a  CRS  that  is  set  up  by  a  “trusted  third  party” 
(TTP).  We  follow  [12],  where  the  authors  show  how  to  build  a  multiparty  UC-secure  homomorphic  com¬ 
mitment  scheme  (where  multiple  parties  can  act  as  committer)  based  on  any  double-trapdoor  homomorphic 
commitment  scheme. 

Definition  1  (Double-trapdoor  Homomorphic  Commitment  for  ¥p  [12]).  It  is  a  collection  of  five  PPT 
algorithms  (Gen,  Comm,  Open,  Equivocate,  TDExtract,  ©).• 

-  Gen(l'')  ^  (ck,  tq,  ti).'  the  generation  algorithm  outputs  a  commitment  key  ck,  along  with  trapdoors 
To  and  Ti- 

-  Commck(a;;  ro,  ri)  ^  Ca;,ro,ri-'  the  commitment  algorithm  takes  a  message  x  G  ¥p  and  randomness 
ro,  ri  from  the  commitment  randomness  space  TZ  ^  and  outputs  a  commitment  Cx-ro,ri  of  x  under  the 
randomness  rg,  ri. 

-  Open(.|^(C,  (x;ro,ri))  ^  {0,  !}.•  the  opening  algorithm  takes  a  commitment  C,  along  with  a  mes¬ 
sage/randomness  triplet  (x,  ro,  n)  ond  outputs  1  ifC  =  Commck(a^;  ro,  ri),  else  0. 

-  Equivocate(Ca;,ro,ri,  2:,  ro,  ri,  x,  Ti)  (ro,  ri)  G  TZ:  using  one  of  the  trapdoors  Ti  with  i  G  {0, 1},  the 
equivocation  algorithm  can  open  a  commitment  Ca;,ro,ri  with  any  message  x  x  with  randomness  fo 
and  ri  where  ri-i  =  ri_j. 

-  TDExtract(C,  X,  ro,  ri,  X,  fo,  ri,  Tj)  ^  Ti-i:  using  one  of  the  trapdoors  Ti  with  i  G  {0,1}  and  two 
different  sets  of  message/randomness  triplet  for  the  same  commitment,  namely  x,  ro,  ri  and  x,  fo,  f  1,  the 
trapdoor  extraction  algorithm  can  find  the  other  trapdoor  Ti-i  ifri-i  7^  fi-i. 

The  commitments  are  homomorphic  meaning  that  Comm(x;  ro,  ri)  ©  Comm(y;  so©i)  =  Comm(x  -|- 
y;  ro  +  so,  ri  -|-  si)  and  Comm(x;  ro,  ri)*^  =  Comm(c  •  x;  c  •  ro,  c  •  ri)for  any  publicly  known  constant 

c. 

We  require  the  following  properties  to  be  satisfied: 

-  Trapdoor  Security.-  There  exists  no  PPT  algorithm  A  such  that  A{W,  ck,  Ti)  Ti-i,for  i  G  {0, 1}. 

-  Computational  Binding.-  There  exists  no  PPT  algorithm  A  with  A{W ,ck)  (x,ro,ri,x,fo,fi)  and 

(x,  ro,  ri)  /  (x,  fo,  fi),  but  Commck(x;  ro,  ri)  =  Commck(x;  fo,  fi). 

-  Statistical  Hiding.-  Vx,x  G  Fp  and  ro,ri  G  TZ,  let  (fo,fi)  =  Equivocate(Ca;^,.g^,.^,x,ro,ri,x, Tj), 
with  i  G  {0, 1}.  Then  Commck(a;;  ro,  ri)  =  Commck(x;  fo,  fi)  =  Cx,ro,ri;  moreover  the  distribution  of 
(ro,  ri)  and  (fo,  fi)  are  statistically  close. 

We  will  use  the  following  instantiation  of  a  double-trapdoor  homomorphic  commitment  scheme  which  is 
a  variant  of  the  standard  Pedersen  commitment  scheme  over  a  group  G  in  which  discrete  logarithms  are 
hard  [12].  The  message  space  is  Fp  and  the  randomness  space  is  72.  =  Fp. 

-  Gen(l'^)  ^  {{G,p,g,  ho,  /ii),  ro,  n),  where  ck  =  {G,p,g,  ho,  hi)  such  that  g,  ho,  hi  are  generators  of 
the  group  G  of  prime  order  p  and  (7"^*  =  hj  for  i  G  {0, 1}. 

’  For  the  ease  of  presentation,  we  assume  TZ  to  be  an  additive  group. 
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-  Commck(x;ro,ri)  with  x,  ro,  n  G  Fp. 

-  Open^k(C,  (x,  ro,  n))  ^  1,  if  C  =  g^hl°h[\  else  Open^k(C,  (x,  ro,  n)) 

-  Equ\vocate{Cx,ro,ri,x,ro,ri,x,Ti)  (fo, fi)  where  fi_i  =  ri_iandrj 

-  TDExtract(C,x,ro,ri,x,  fo,ri,rj)  ^  Ti_i,  where  if  fi_j  /  ri_j,  then 

X  -  X  +  Ti{fi  -  n) 

Xl—i  —  ^  • 

ri-i  -  ri-i 

-  The  homomorphie  operation  0  is  just  the  group  operation  i.e. 

Comm(x;ro,ri)  0  Comm(x;ro,ri)  = 

=  g^+^  ■ 

=  Comm(x  +  x;  ro  +  ro,  ri  +  ri). 

We  ean  now  define  the  various  types  of  seeret-shared  data  used  in  our  protoeols.  Let  ai , . . . ,  an  G  Fp  be 
n  publiely  known  non-zero,  distinet  values,  where  a*  is  assoeiated  with  Pi  as  the  evaluation  point.  The  [•] 
sharing  is  the  standard  Shamir-sharing  [28],  where  the  seeret  value  will  be  shared  among  the  set  of  parties 
V  with  threshold  t.  Additionally,  a  eommitment  of  eaeh  individual  share  will  be  available  publiely,  with  the 
eorresponding  share-holder  possessing  the  randomness  of  the  eommitment. 

Definition  2  (The  [•]  Sharing).  Let  s  G  Fp,-  then  s  is  said  to  be  [-yshared  among  V  if  there  exist  polyno¬ 
mials,  say  f{-),g{-)  and  h{-),  of  degree  at  most  t,  with  /(O)  =  s  and  every  (honest)  party  Pi  £  P  holds 
a  share  fi  =  /(aj)  of  s,  along  with  opening  information  gi  =  g(ai)  and  hi  =  h{ai)  for  the  commitment 
Cfi,gi,hi  =  Commck(/i;  gi,  hi).  The  information  available  to  party  Pi  £  V  as  part  of  the  \^]-sharing  of  s 
is  denoted  by  [s]i  =  {fi,gi,  hi,  parties  will  also  have  the  access  to  ck.  Moreover,  the 

collection  of[s\i’s,  corresponding  to  Pi  £1^  is  denoted  by  [s]. 

The  seeond  type  of  seeret-sharing  (whieh  is  a  variation  of  additive  sharing),  is  used  to  perform  eomputation 
via  a  dishonest  majority  MFC  protoeol  amongst  our  eommittees. 

Definitions  (The  (•)  Sharing).  A  value  s  G  Fp  is  said  to  be  {■) -shared  among  a  set  of  parties  X  C  V, 
if  every  (honest)  party  Pi  £  X  holds  a  share  Si  of  s  along  with  the  opening  information  Ui,Vifor  the 
commitment  Csi^m.ui  =  Comm(-k(si;  Uj,  r*),  such  that  ~  information  available  to  party 

Pi  £  X  aspartofthe  {■) -sharing  of  s  is  denoted  by  {s)i  =  {si,Ui,Vi,{Csj,uj,Vj}Pj&x)-  All  parties  will  also 
have  access  to  ck.  The  collection  of  {s)i’s  corresponding  to  Pi  £  X  is  denoted  by  {s)x- 

It  is  easy  to  see  that  both  types  of  seeret-sharing  are  linear.  For  example,  for  the  (•)  sharing,  given  , . . . , 

and  publiely  known  eonstants  ci, ...  ,c^,  the  parties  in  X  ean  loeally  eompute  their  information  eor¬ 
responding  to  (ci  •  +  . . .  +  ■  s^^'^)x-  This  follows  from  the  homomorphie  property  of  the  underlying 

eommitment  seheme  and  the  linearity  of  the  seeret-sharing  seheme.  This  means  that  the  parties  in  X  ean 
loeally  eompute  (ci  •  +  . . .  +  C£  ■  s^^^)x  from  •  •  • ,  {s^)x,  sinee  eaeh  party  Pi  in  X  ean  loeally 

eompute  (ci  •  +  . . .  +  q  •  s^^'l)i  from  {sA'))i^ . . . ,  {s^)i. 


0. 

=  r“^(x  -  x)  +  ri. 


3  Main  Protocol 

We  now  present  an  MFC  protoeol  implementing  the  standard  honest-majority  (meaning  e  <  1/2)  MFC 
funetionality  Pj  presented  in  Figure  1  whieh  eomputes  the  funetion  /. 

We  now  present  the  underlying  idea  of  our  protoeol  (outlined  earlier  in  the  introduetion).  The  protoeol  is 
set  in  a  variant  of  the  player-elimination  framework  from  [4].  During  the  eomputation  either  pairs  of  parties, 
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Functionality 

T j  interacts  with  the  parties  in  V  and  the  adversary  S  and  is  parametrized  by  an  n-input  function  f  :  ¥p  —>  ¥p. 

-  Upon  receiving  (sid,  i,  from  every  Pi  £  V  where  £  Fp,  the  functionality  computes  y  =  . . . , 

sends  (sid,  y)  to  all  the  parties  and  the  adversary  5  and  halts. 


Fig.  1.  The  Ideal  Functionality  for  Computing  a  Given  Function  / 

each  containing  at  least  one  actively  corrupted  party,  or  singletons  of  corrupted  parties,  are  identified  due  to 
some  adversarial  behavior  of  the  corrupted  parties.  These  pairs,  or  singletons,  are  then  eliminated  from  the 
set  of  eligible  parties.  To  understand  how  we  deal  with  the  active  corruptions,  we  need  to  define  a  dynamic 
sef  £  C  "P  of  size  n,  which  will  define  fhe  currenf  sef  of  eligible  parfies  in  our  profocol,  and  a  fhreshold 
t  which  defines  fhe  maximum  number  of  corrupfed  parfies  in  C.  Inifially  C  is  sef  fo  be  equal  fo  V  (hence 
n  =  n)  and  t  is  sef  fo  t.  We  fhen  divide  fhe  circuif  ckt  (represenfing  /)  fo  be  evaluafed  info  L  levels,  where 
each  level  consisfs  of  a  sub-circuif  of  depfh  d/ L\  wifhouf  loss  of  generalify,  we  assume  d  fo  be  a  multiple 
of  L.  We  denofe  fhe  ifh  sub-circuif  as  cktj.  Af  fhe  beginning  of  fhe  profocol,  all  fhe  parfies  in  V  verifiably 
[•] -share  fheir  inpufs  for  fhe  circuif  ckt. 

For  evaluafing  a  sub-circuif  ckt;,  insfead  of  involving  all  fhe  parfies  in  C,  we  rafher  involve  a  small  and 
random  commiffee  C  C  £  of  parfies  of  size  c,  where  c  is  fhe  minimum  value  safisfying  fhe  consfrainf  fhaf 
gC  <  2“'^;  recall  e  =  t/n.  During  fhe  course  of  evaluafing  fhe  sub-circuif,  if  any  inconsisfency  is  reporfed, 
fhen  fhe  (honesf)  parfies  in  V  will  identify  eifher  a  single  corrupfed  parfy  or  a  pair  of  parfies  from  £  where 
fhe  pair  confains  af  leasf  one  corrupfed  parfy.  The  idenfified  parfy(ies)  is(are)  eliminafed  from  £  and  fhe 
value  of  t  is  decremenfed  by  one,  followed  by  re-evaluafion  of  ckt;  by  choosing  a  new  committee  from  the 
updated  set  £.  This  is  reminiscent  of  the  player-elimination  framework  from  [4],  however  the  way  we  apply 
the  player-elimination  framework  is  different  from  the  standard  one.  Specifically,  in  the  player-elimination 
framework,  the  entire  set  of  eligible  parties  £  is  involved  in  the  computation  and  the  player  elimination 
is  then  performed  over  the  entire  £,  thus  requiring  huge  communication.  On  the  contrary,  in  our  context, 
only  a  small  set  of  parties  C  is  involved  in  the  computation,  thus  significantly  reducing  the  communication 
complexity.  It  is  easy  to  see  that  after  a  sequence  of  t  failed  sub-circuit  evaluations,  £  will  be  left  with  only 
honest  parties  and  so  each  sub-circuit  will  be  evaluated  successfully  from  then  onwards. 

Note  that  the  way  we  eliminate  the  parties,  the  fraction  of  corrupted  parties  in  £  after  any  un-successful 
attempt  for  sub-circuit  evaluation,  is  upper  bounded  by  the  fraction  of  corrupted  parties  in  £  prior  to  the 
evaluation  of  the  sub-circuit.  Specifically,  let  Coid  =  t/n  be  the  fraction  of  corrupted  parties  in  £  prior  to  the 
evaluation  of  a  sub-circuit  ckt;  and  let  the  evaluation  fail,  with  either  a  single  party  or  a  pair  of  parties  being 
eliminated  from  £.  Moreover,  let  Cnew  be  the  fraction  of  corrupted  parties  in  £  after  the  elimination.  Then 
for  single  elimination,  we  have  Cnew  =  and  so  enew  <  Coid  if  and  only  if  n  >  t,  which  will  always  hold. 
On  the  other  hand,  for  double  elimination,  we  have  Cnew  =  yzy  and  so  Cnew  <  fold  if  and  only  if  n  >  2t, 
which  will  always  hold. 

Since  a  committee  C  (for  evaluating  a  sub-circuit)  is  selected  randomly,  except  with  probability  at  most 
gC  ^  2~'^,  the  selected  committee  contains  at  least  one  honest  party  and  so  the  sub-circuit  evaluation  among 
C  needs  to  be  performed  via  a  dishonest  majority  MFC  protocol.  We  choose  the  MFC  protocol  of  [12],  since 
it  can  be  modified  to  identify  pairs  of  parties  consisting  of  at  least  one  corrupted  party  in  the  case  of  the 
failed  evaluation,  without  violating  the  privacy  of  the  honest  parties.  To  use  the  protocol  of  [12]  for  sub¬ 
circuit  evaluation,  we  need  the  corresponding  sub-circuit  inputs  (available  to  the  parties  in  V  in  [•] -shared 
form)  to  be  converted  and  available  in  (•) -shared  form  to  the  parties  in  C  and  so  the  parties  in  V  do  the  same. 
After  every  successful  evaluation  of  a  sub-circuit,  via  the  dishonest  majority  MFC  protocol,  the  outputs  of 
that  sub-circuit  (available  in  (•) -shared  form  to  the  parties  in  a  committee)  are  converted  and  saved  in  the 
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form  of  [-J-sharing  among  all  the  parties  in  V.  As  the  set  V  has  a  honest  majority,  [-J-sharing  ensures  robust 
reconstruction  implying  that  the  shared  values  are  recoverable.  Since  the  inputs  to  a  sub-circuit  come  either 
from  the  outputs  of  previous  sub-circuit  evaluations  or  the  original  inputs,  both  of  which  are  [•] -shared,  a 
failed  attempt  for  a  sub-circuit  evaluation  does  not  require  a  re-evaluation  of  the  entire  circuit  from  scratch 
but  requires  a  re-evaluation  of  that  sub-circuit  only. 

3.1  Supporting  Functionalities 

We  now  present  a  number  of  ideal  functionalities  defining  sub-components  of  our  main  protocol;  see  Ap¬ 
pendix  B  for  the  UC-secure  instantiations  of  these  functionalities. 

Basic  Functionalities:  The  functionality  .Fcrs  for  generating  the  common  reference  string  (CRS)  for 
our  main  MFC  protocol  is  given  in  Figure  2.  The  functionality  outputs  the  commitment  key  of  a  double¬ 
trapdoor  homomorphic  commitment  scheme,  along  with  the  encryption  key  of  an  IND-CCA  secure  encryp¬ 
tion  scheme  (to  be  used  later  for  UC-secure  generation  of  completely  random  (-j-shared  values  as  in  [12]), 
see  Appendix  D.  The  functionality  .Fbc  for  group  broadcast  is  given  in  Figure  3.  This  functionality  broad¬ 
casts  the  message  sent  by  a  sender  Sen  G  "P  to  all  the  parties  in  a  sender  specified  sef  of  parties  A  C  P; 
in  our  confexf,  fhe  sef  X  will  always  confain  af  leasf  one  hones!  parfy.  The  funcfionalify  Pcommittee  for  a 
random  commiffee  selection  is  given  in  Figure  4.  This  funcfionalify  is  parameferized  by  a  value  c,  if  selecfs 
a  sef  A  of  c  parties  af  random  from  a  specified  sef  y  and  oufpufs  fhe  selecfed  sef  X  fo  fhe  parfies  in  P. 

Functionality  Pcrs 

PcRS  interacts  with  the  parties  in  V  and  the  adversary  S  and  is  parameterized  by  it. 

-  Upon  receiving  (sid,i)  from  every  party  Pi  £  V,  the  functionality  computes  Gen(U)  ^  (ck, ro,ri)  and 
^  (pk,  sk),  where  G  is  the  key-generation  algorithm  of  an  IND-CCA  secure  encryption  scheme"  and 
Gen  is  the  key-generation  algorithm  of  a  double-trapdoor  homomorphic  commitment  scheme.  The  functionality  then  sets 
CRS  =  (ck,  pk)  and  sends  (sid,  i,  CRS)  to  every  party  Pi  GP  and  the  adversary  S  and  halts. 


"  For  use  in  the  protocol  of  [12] 

Fig.  2.  The  Ideal  Functionality  for  Generating  CRS 


Functionality  Pbc 

Pbc  interacts  with  the  parties  in  V  and  the  adversary  S. 

-  Upon  receiving  (sid,  Sen,  a:,  X)  from  the  sender  Sen  £  V  such  that  X  V,  the  functionality  sends  (sid,  j.  Sen,  x)  to 
every  Pj  £  X  and  to  the  adversary  S  and  halts. 

Fig.  3.  The  Ideal  Functionality  for  Broadcast 


Functionality  Related  to  [-J-sharings:  In  Figure  5  we  present  the  funetionality  Pgen[  ]  whieh  allows  a 
dealer  D  G  P  to  verifiably  [•] -share  an  already  eommitted  seeret  among  the  parties  in  P.  The  funetionality 
is  invoked  when  it  reeeives  three  polynomials,  say  and  h{-)  from  the  dealer  D  and  a  eommitment, 

say  C,  supposedly  the  eommitment  of  /(O)  with  randomness  ^(0),  /i(0)  (namely  Cj(o),g(o),/i(o))’  from  the 
(majority  of  the)  parties  in  P.  The  funetionality  then  hands  fi  =  f{ai),gi  =  g{ai),  hi  =  h{ai)  and  eommit- 
ments  {Cf.^g.^hj}Pj£V  to  Pi  ^  V  after  ‘verifying’  that  (a):  All  the  three  polynomials  are  of  degree  at  most 

361 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


Functionality  :?xommittee 

^Committee,  parametrized  by  a  constant  c,  interacts  with  the  parties  in  V  and  the  adversary  S. 

-  Upon  receiving  (sid,  i,  3^’)  from  every  Pi  £  V,  the  functionality  selects  c  parties  at  random  from  the  set  3^  that  is  received 
from  the  majority  of  the  parties  and  denotes  the  selected  set  as  X.  The  functionality  then  sends  (sidjijA’)  to  every 
Pi  a  V  and  S  and  halts. 


Fig.  4.  The  Ideal  Functionality  for  Selecting  a  Random  Committee  of  Given  Size  c 

t  and  (b):  C  =  Commck(/(0);  ^(0),  /i(0))  i.e.  the  value  (and  the  eorresponding  randomness)  eommitted  in 
C  are  embedded  in  the  eonstant  term  of  and  h{-)  respeetively.  If  either  of  the  above  two  eheeks 

fail,  then  the  funetionality  returns  Failure  to  the  parties  indieating  that  D  is  eorrupted. 

In  our  MPC  protoeol  where  ]  is  called,  the  dealer  will  eompute  the  eommitment  C  as  Comirick  (/(O) 
p(0),  h{0))  and  will  broadeast  it  prior  to  making  a  eall  to  .Fgen[  ]-  H  is  easy  to  note  that  .Fgen[  ]  generates 
[/(O)]  if  D  is  honest  or  well-behaved.  If  .7^gen[  ]  returns  Failure,  then  D  is  indeed  eorrupted. 

Functionality 

interacts  with  the  parties  in  V,  a  dealer  D  £  "P,  and  the  adversary  S  and  is  parametrized  by  a  commitment  key  ck  of  a 
double-trapdoor  homomorphic  commitment  scheme,  along  with  t. 

-  On  receiving  (sid,  D,  h{-))  from  D  and  (sid,  i,  D,  C)  from  every  Pi  £  V,  the  functionality  verifies  whether 

f{-),g{-)  and  h{-)  are  of  degree  at  most  t  and  C  =  Commck(/(0);  (?(0),  h{0)),  where  C  is  received  from  the  majority 
of  the  parties. 

-  If  any  of  the  above  verifications  fail  then  the  functionality  sends  (sid,  i,  D,  Failure)  to  every  Pi  (zP  and  S  and  halts. 

-  Else  for  every  Pi  £  V,  the  functionality  computes  the  share  fi  =  f{ai),  the  opening  information 

gi  —  g{ai),hi  —  h{ai),  and  the  commitment  Cfi,gi,hi  =  Commck{  f i',  gi,  hi).  It  sends  (sid,  i,  D,  [s]i)  to  every 
Pi  eV  where  [s]i  =  (fi,  gi,  hi,  }pjev)  and  halts. 


Fig.  5.  The  Ideal  Functionality  for  Verifiably  Generating  [-J-sharing 

We  note  that  .?^gen[  ]  is  slightly  different  from  the  standard  ideal  functionality  (see  e.g.  [2])  of  verifiable 
secret  sharing  (VSS)  where  the  parties  output  only  their  shares  (and  not  the  commitment  of  all  the  shares). 
In  most  of  the  standard  instantiations  of  a  VSS  functionality  (in  the  computational  setting),  for  example  the 
Pedersen  VSS  [27],  a  publie  eommitment  of  all  the  shares  and  the  seeret  are  available  to  the  parties  without 
violating  any  privaey.  In  order  to  make  these  eommitments  available  to  the  external  protoeol  that  invokes 
•^Gen[  ]5  we  allow  the  functionality  to  compute  and  deliver  the  shares  along  with  the  eommitments  to  the 
parties.  We  note,  [1]  introdueed  a  similar  functionality  for  “committed  VSS”  that  outputs  to  the  parties  the 
eommitment  of  the  seeret  provided  by  the  dealer  due  to  the  same  motivation  mentioned  above. 


3.2  Supporting  Sub-protocols 

Our  MPC  protoeol  also  makes  use  of  the  following  sub-pro toeols.  Due  to  spaee  eonstraints,  here  we  only 
present  a  high  level  description  of  these  protocols  and  state  their  eommunication  eomplexity.  The  formal 
details  of  the  protoeols  are  available  in  Appendix  C.  Sinee  we  later  show  that  our  main  MPC  protocol  that 
invokes  these  sub-protoeols  is  UC-seeure,  it  is  not  required  to  prove  any  form  of  seeurity  for  these  sub- 
protoeols  separately. 

(A)  Protocol  77^. (Figure  10,  Appendix  C)  :  it  takes  input  {■s)x  for  a  set  X  eontaining  at  least  one 
honest  party  and  either  produees  a  sharing  [s]  (if  all  the  parties  in  X  behave  honestly)  or  outputs  one  of 
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the  following:  the  identity  of  a  single  eorrupted  party  or  a  pair  of  parties  (with  at  least  one  of  them  being 
corrupted)  from  X.  The  protocol  makes  use  of  the  functionalities  ^gen[  ]  Xbc- 

More  specifically,  let  {s)i  denote  the  information  (namely  the  share,  opening  information  and  the  set 
of  commitments)  of  party  Pi  £  X  corresponding  to  the  sharing  {s)x-  To  achieve  the  goal  of  our  protocol, 
there  are  two  clear  steps  to  perform:  first,  the  correct  commitment  for  each  share  of  s  corresponding  to  its 
(•) A” -sharing,  now  available  to  the  parties  in  X,  is  to  be  made  available  to  all  the  parties  in  "P;  second,  each 
Pi  ^  X  is,  required  to  act  as  a  dealer  and  verifiably  [-J-share  its  already  committed  share  Si  among  V.  Note 
that  the  commitment  to  Si  is  included  in  the  set  of  commitments  that  will  be  already  available  among  V  due 
to  the  first  step.  Clearly,  once  [sj]  are  generated  for  each  Pi  G  X,  then  [s]  is  computed  as  [s]  = 
this  is  because  s  =  Si. 

Now  there  are  two  steps  that  may  lead  to  the  failure  of  the  protocol.  First,  Pi  £  X  may  be  identified 
as  a  corrupfed  dealer  while  calling  Pgen[  ]-  Iri  this  case  a  single  corrupted  parly  is  oulpulled  by  every  parly 
in  V.  Second,  Ihe  profocol  may  fail  when  Ihe  parlies  in  V  fry  lo  reach  an  agreemenl  over  Ihe  correcl  sel  of 
commifmenls  of  Ihe  shares  of  s.  Recall  lhal  each  Pi  ^  X  holds  a  sel  of  commitments  as  a  part  of  (s)a’-  We 
ask  each  Pi  ^  X  to  call  Pbc  to  broadcast  among  V  the  set  of  commitments  held  by  him.  It  is  necessary  to  ask 
each  Pi  ^  X  to  do  this  as  we  can  not  trust  any  single  party  from  X,  since  all  we  know  (with  overwhelming 
probability)  is  that  X  contains  at  least  one  honest  party.  Now  if  the  parties  in  V  receive  the  same  set  of 
commitments  from  all  the  parties  in  X,  then  clearly  the  received  set  is  the  correct  set  of  commitments  and 
agreement  on  the  set  is  reached  among  V.  If  this  does  not  happen  the  parties  in  V  can  detect  a  pair  of  parties 
with  conflicting  sets  and  output  the  said  pair.  It  is  not  hard  to  see  that  indeed  one  party  in  the  pair  must  be 
corrupted.  To  ensure  an  agreement  on  the  selected  pair  when  there  are  multiple  such  conflicting  pairs,  we 
assume  the  existence  of  a  predefined  publicly  known  algorilhm  fo  selecl  a  pair  from  Ihe  lol  (for  inslance 
consider  Ihe  pair  {Pa,  Pb)  wilh  minimum  value  of  a  +  n  •  6).  Inluilively  Ihe  profocol  is  secure  as  Ihe  shares 
of  honesl  parlies  in  X  remain  secure. 

The  communication  complexity  of  protocol  is  stated  in  Lemma  1,  which  easily  follows  from 

the  fact  that  each  party  in  X  needs  to  broadcast  0{\X\k)  bits  to  P. 

Lemma  1.  The  communication  complexity  of  protocol  77^. is  13C{\X\‘^K,nj  plus  the  complexity  of 
C)(|T’|)  invocations  to  the  realization  of  the  functionality  .TgenI-]- 

(B)  Protocol  77^.^  (Figure  11,  Appendix  C)  :  the  protocol  enables  a  designated  party  (dealer)  D  G  P  to 
verifiably  (•)-share  an  already  commilled  secrel  /  among  a  sel  of  parlies  X  conlaining  al  leasl  one  honesl 
parly.  More  specifically,  every  Pi  ^  V  holds  a  (publicly  known)  commilmenl  C The  dealer  D  holds  Ihe 
secrel  /  G  Fp  and  randomness  pair  {g,  h),  such  lhal  C f^g^h  =  Commck(/;  g,  h)',  and  Ihe  goal  is  lo  generate 
{f)x-  In  Ihe  protocol,  D  lirsl  addilively  shares  /  as  well  as  Ihe  opening  informalion  {g,  h)  among  X.  In 
addition,  D  is  also  asked  to  publicly  commil  each  additive-share  of  /,  using  Ihe  corresponding  additive- share 
of  {g,  /).  The  parties  can  Ihen  publicly  verify  whelher  indeed  D  has  (•)-shared  Ihe  same  /  as  committed  in 
C via  Ihe  homomorphic  properly  of  Ihe  commitments.  Intuitively  /  remains  private  in  the  protocol  for 
an  honest  D  as  there  exists  at  least  one  honest  party  in  X.  Moreover  the  binding  property  of  the  commitment 
ensures  that  a  potentially  corrupted  D  fails  to  (•)-share  an  incorrect  value  /'  /  /. 

If  we  notice  carefully  the  protocol  achieves  a  little  more  than  (•) -sharing  of  a  secret  among  a  set  of 
parties  X.  All  the  parties  in  V  hold  the  commitments  to  the  shares  of  /,  while  as  per  the  definition  of  (•)- 
sharing  the  commitments  to  shares  should  be  available  to  the  parties  in  X  alone.  A  closer  look  reveals  that 
the  public  commitments  to  the  shares  of  /  among  the  parties  in  V  enable  them  to  publicly  verify  whether 
D  has  indeed  (•)-shared  the  same  /  among  X  as  committed  in  Cj^g^h  via  the  homomorphic  property  of  the 
commitments.  The  communication  complexity  of  77^.^  is  stated  in  Lemma  2. 
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Lemma  2.  The  communication  complexity  of  protocol  77^.^  is  0{\X\k)  and  BC{\X\k,  n). 

(C)  Protocol  (Figure  12,  Appendix  C):  the  protocol  takes  as  input  [s]  for  any  secret  s  and  outputs 

{s)x:  for  a  designated  set  of  parties  X  C  V  containing  at  least  one  honest  party. 

Let  /i,  be  the  Shamir-shares  of  s.  Then  the  protocol  is  designed  using  the  following  two-stage 

approach:  (1):  First  each  party  £  V  acts  as  a  dealer  and  verifiably  (•)-share’s  its  share  fk  via  protocol 
Tfi^.y,  (2)  Let  Tt  be  the  set  of  \7{\  >  f  +  1  parties  Pk  who  have  correctly  (•)-shared  its  Shamir-share  fk', 
without  loss  of  generality,  let  H  be  the  set  of  first  \7{\  parties  in  V.  Since  the  original  sharing  polynomial 
(for  [•] -sharing  s)  has  degree  at  most  t  with  s  as  its  constant  term,  then  there  exists  publicly  known  constants 
(namely  the  Lagrange’s  interpolation  coefficients)  ci, . . . ,  C|^|,  such  that  s  =  cifi  +  ...  +  C|^|/|^|.  Since 
corresponding  to  each  Pk  ^  Tt  the  share  fk  is  (•)-shared,  it  follows  easily  that  each  party  Pi  ^  X  can 
compute  (s)i  =  ci(/i)i  c\'kc\{f\n\)i-  The  correctness  of  the  protocol  follows  from  the  fact  that  the 

corrupted  parties  in  V  will  fail  to  (•)-share  an  incorrect  Shamir-share  of  s,  thanks  to  the  protocol  The 
privacy  of  s  follows  from  the  fact  that  the  Shamir  shares  of  the  honest  parties  in  V  remain  private,  which 
follows  from  the  privacy  of  the  protocol  77^.^. 

The  communication  complexity  of  the  protocol  is  stated  in  Lemma  3  which  follows  from  the 

fact  that  n  invocations  to  77^.^  are  done  in  the  protocol. 

Lemma  3.  The  communication  complexity  o/77[.]^^y  is  0{n\X\K)  and  BC  (^n\X\K,  n). 

(D)  Protocol  77i{andZero[  ]  (Figure  14,  Appendix  C):  the  protocol  is  used  for  generating  a  random  [•]- 
sharing  of  0.  To  design  the  protocol,  we  also  require  a  standard  Zero-knowledge  (ZK)  functionality  T^zk.bc 
to  publicly  prove  a  commitment  to  zero.  The  functionality  is  a  “prove-and-broadcast  ”  functionality  that 
upon  receiving  a  commitment  and  witness  pair  (C,  {u,  v))  from  a  designated  prover  Pj,  verifies  if  C  = 
Commck(0;  u,  v)  or  nof.  If  so  if  sends  C  fo  all  fhe  parties.  A  protocol  77zk.bc  realizing  .Tzk.bc  can  be  de¬ 
signed  in  fhe  CRS  model  using  sfandard  fechniques,  see  [22],  wifh  communication  complexify  0{Po\)/{n)K). 

Protocol  77RAfjjj2ERo[  ]  invokes  fhe  ideal  functionalities  .7'zk.bc  nnd  7^gen[  ]-  The  idea  is  as  follows:  Each 
parfy  Pi  ^  V  firsf  broadcasfs  a  random  commifmenf  of  0  and  proves  in  a  zero-knowledge  (ZK)  fashion  fhaf 
if  indeed  commiffed  0.  Nexf  Pi  calls  .T'gen]  ]  ns  a  dealer  D  fo  generafe  [•] -sharing  of  0  fhaf  is  consisfenf  wifh 
fhe  commifmenf  of  0.  The  parties  fhen  locally  add  fhe  sharings  of  fhe  dealers  who  are  successful  as  dealers 
in  fheir  corresponding  calls  fo  .T’genI  ]-  Since  fhere  exisfs  af  leasf  one  honesf  parfy  in  fhis  sef  of  dealers, 
fhe  resulfanf  sharing  will  be  indeed  a  random  sharing  of  0,  see  Appendix  C  for  defails.  Looking  ahead,  we 
invoke  77randZero[  ]  only  once  in  our  main  MPC  profocol  77 f  (more  on  fhis  lafer);  so  we  avoid  giving  defails 
of  fhe  communication  complexify  of  fhe  profocol.  However  assuming  sfandard  realization  of  T^zk.bc?  the 
profocol  has  complexify  (!7(Poly(n)«:). 

(E)  Dis-honest  Majority  MPC  Protocol  (Appendix  D):  Apart  from  the  above  sub-protocols,  we  use  a 
non-robust,  dishonest-majority  MPC  protocol  with  the  capability  of  fault-detection.  The  protocol, 
presented  in  Figure  18  of  Appendix  D,  allows  a  designated  set  of  parties  X  C  V,  containing  at  least  one 
honest  party,  to  perform  (-j-shared  evaluation  of  a  given  circuit  C.  In  case  some  corrupted  party  in  X  behaves 
maliciously,  the  parties  in  V  identify  a  pair  of  parties  from  X,  with  at  least  one  of  them  being  corrupted. 
The  starting  point  of  77^^  is  the  dishonest  majority  MPC  protocol  of  [12],  which  takes  (•) -shared  inputs 
of  a  given  circuit,  from  a  set  of  parties,  say  X,  having  a  dishonest  majority.  The  protocol  then  achieves  the 
following: 

-  If  all  the  parties  in  X  behave  honestly,  then  the  protocol  outputs  (•)-shared  circuit  outputs  among  X. 

-  Else  the  honest  parties  in  X  detect  misbehaviour  by  the  corrupted  parties  and  abort  the  protocol. 
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We  observe  that  for  an  aborted  exeeution  of  the  protoeol  of  [12],  there  exists  an  honest  party  in  X  that  can 
locally  identify  a  corrupted  party  from  X,  who  deviated  from  the  protocol.  We  exploit  this  property  in 
to  enable  the  parties  in  V  identify  a  pair  of  parties  from  X  with  at  least  one  of  them  being  corrupted. 

Protocol  ^  proceeds  in  two  stages,  the  preparation  stage  and  the  evaluation  stage,  each  involving 
various  other  sub-protocols  (details  available  in  Appendix  D).  In  the  preparation  stage,  if  all  the  parties  in 
X  behave  honestly,  then  they  jointly  generate  Cm  +  Cr  shared  multiplication  triples 
(c^*^)A’)}i=i,...,CM-i-CR>  such  that  and  each  (a^^  b^*^  is  random  and  unknown  to  the 

adversary;  here  Cm  and  Cr  are  the  number  of  multiplication  and  random  gates  in  C  respectively.  Otherwise, 
the  parties  in  V  identify  a  pair  of  parties  in  X,  with  at  least  one  of  them  being  corrupted. 

Assuming  that  the  desired  (•) -shared  multiplication  triples  are  generated  in  the  preparation  stage,  the 
parties  in  X  start  evaluating  C  in  a  shared  fashion  by  maintaining  the  following  standard  invariant  for  each 
gate  of  C:  Given  {■) -shared  inputs  of  the  gate,  the  parties  securely  compute  the  {■) -shared  output  of  the  gate. 
Maintaining  the  invariant  for  the  linear  gates  in  C  does  not  require  any  interaction,  thanks  to  the  linearity 
of  (-j-sharing.  For  a  multiplication  gate,  the  parties  deploy  a  preprocessed  (-j-shared  multiplication  triple 
from  the  preparation  stage  (for  each  multiplication  gate  a  different  triple  is  deployed)  and  use  the  standard 
Beaver’s  trick  [3],  (see  protocol  TIbea  in  Appendix  D) .  While  applying  Beaver’s  trick,  the  parties  in  X  need 
to  publicly  open  two  (-j-shared  values  using  a  reconstruction  protocol  ilREc(  }  (presented  in  Appendix  D).  It 
may  be  possible  that  the  opening  is  non-robust^,  in  which  case  the  circuit  evaluation  fails  and  the  parties  in 
V  identify  a  pair  of  parties  from  X  with  at  least  one  of  them  being  corrupted.  For  a  random  gate,  the  parties 
consider  an  (•) -shared  multiplication  triple  from  the  preparation  stage  (for  each  random  gate  a  different 
triple  is  deployed)  and  the  first  component  of  the  triple  is  considered  as  the  output  of  the  random  gate.  The 
protocol  ends  once  the  parties  in  X  obtain  (-j-shared  circuit  outputs  {yi)x,  ■  ■  ■ ,  {yout)x',  so  no  reconstruction 
is  required  at  the  end. 

The  complete  details  of  is  provided  in  Appendix  D.  The  protocol  invokes  two  ideal  functionalities 
•^GenRand()  Xbc  where  the  functionality  .?^genRand()  is  used  to  generate  (-j-sharing  of  random  values 
(again  see  Appendix  D).  For  our  purpose  we  note  that  the  protocol  provides  a  statistical  security  of  2~^  and 
has  communication  complexity  as  stated  in  Lemma  4  and  proved  in  Appendix  D.  Note  that  there  are  two 
types  of  broadcast  involved:  among  the  parties  in  X  and  among  the  parties  in  V. 

Lemma  4.  For  a  statistical  security  parameter  s,  protocol  has  communication  complexity  ofOi\X\^{\C\-\- 
s)k),BC[\X\‘^{\C\  +  s)k,  I  Aj)  and  BC[\X\k,  n). 

3.3  The  MFC  Protocol 

Finally,  we  describe  our  MFC  protocol.  Recall  that  we  divide  the  circuit  ckt  into  sub-circuits  ckti, . . . ,  ckt^ 
and  we  let  in;  and  out;  denote  the  number  of  input  and  output  wires  respectively  for  the  sub-circuit  ckt;. 

At  the  beginning  of  the  protocol,  each  party  [•] -share  their  private  inputs  by  calling  .Fgen[  ]-  The  parties 
then  select  a  random  committee  of  parties  by  calling  .^committee  for  evaluating  the  (th  sub-circuit  via  the 
dishonest  majority  MFC  protocol  of  [12].  We  use  a  Boolean  flag  NewCom  in  the  protocol  to  indicate  if  a 
new  committee  has  to  be  decided,  prior  to  the  evaluation  of  (th  sub-circuit  or  the  committee  used  for  the 
evaluation  of  the  ((  —  l)th  sub-circuit  is  to  be  continued.  Specifically  a  successful  evaluation  of  a  sub-circuit 
is  followed  by  setting  NewCom  equals  to  0,  implying  that  the  current  committee  is  to  be  continued  for  the 
evaluation  of  the  subsequent  sub-circuit.  On  the  other  hand,  a  failed  evaluation  of  a  sub-circuit  is  followed 
by  setting  NewCom  equals  to  1,  implying  that  a  fresh  committee  has  to  be  decided  for  the  re-evaluation  of 
the  same  sub-circuit  from  the  updated  set  of  eligible  parties  C,  which  is  modified  after  the  failed  evaluation. 

*  As  we  may  not  have  honest  majority  in  X,  we  could  not  always  ensure  robust  reconstruction  during  1Irec(  )  • 
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After  each  successful  sub-circuit  evaluation,  the  corresponding  (•)-shared  outputs  are  converted  into  [•]- 
shared  outputs  via  protocol  while  prior  to  each  sub-circuit  evaluation,  the  corresponding  [-J-shared 

inputs  are  converted  to  the  required  (•)-shared  inputs  via  protocol  The  process  is  repeated  till  the 

function  output  is  [•] -shared,  after  which  it  is  robustly  reconstructed  (as  we  have  honest  majority  in  V). 

Without  affecting  the  correctness  of  the  above  steps,  but  to  ensure  simulation  security  (in  the  UC 
model),  we  add  an  additional  output  re-randomization  step  before  the  output  reconstruction:  the  parties 
call  TfRANDZERo[  ]  to  generate  a  random  [0],  which  they  add  to  the  [•] -shared  output  (thus  keeping  the  same 
function  output).  Looking  ahead,  during  the  simulation  in  the  security  proof,  this  step  allows  the  simulator 
to  cheat  and  set  the  final  output  to  be  the  one  obtained  from  the  functionality,  even  though  it  simulates  the 
honest  parties  with  0  as  the  input  (see  Appendix  E  for  the  details). 

Let  E  be  the  event  that  at  least  one  party  in  each  of  the  selected  committees  during  sub-circuit  evaluations 
is  honest;  the  event  E  occurs  except  with  probability  at  most  (f  -|-  1)  •  e'^  ss  2“'^.  This  is  because  at  most 
(f  -|-  1)  (random)  committees  need  to  be  selected  (a  new  committee  is  selected  after  each  of  the  t  failed 
sub-circuit  evaluation  plus  an  initial  selection  is  made).  It  is  easy  to  see  that  conditioned  on  E,  the  protocol 
is  private:  the  inputs  of  the  honest  parties  remain  private  during  the  input  stage  (due  to  .Fgen[  ])5  while  each 
of  the  involved  sub-protocols  for  sub-circuit  evaluations  does  not  leak  any  information  about  honest  party’s 
inputs.  It  also  follows  that  conditioned  on  E,  the  protocol  is  correct,  thanks  to  the  binding  property  of  the 
commitment  and  the  properties  of  the  involved  sub-protocols. 

The  properties  of  the  protocol  ilj  are  stated  in  Theorem  1  and  the  security  proof  is  available  in  Ap¬ 
pendix  E;  we  only  provide  the  proof  of  communication  complexity  here.  The  (circuit-dependent)  commu¬ 
nication  complexity  in  the  theorem  is  derived  after  substituting  the  calls  to  the  various  ideal  functionalities 
by  the  corresponding  protocols  implementing  them.  The  broadcast  complexity  has  two  parts:  the  broadcasts 
among  the  parties  in  V  and  the  broadcasts  among  small  committees. 

Theorem  1,  Let  f  :  ^  ¥p  be  a  publicly  known  n-input  function  with  circuit  representation  ckt  over 

Fp,  with  average  width  w  and  depth  d  (thus  w  =  Moreover,  let  ckt  be  divided  into  sub-circuits 

ckti, . . .  with  L  =  t  and  each  sub-circuit  ckt;  having  fan-in  in;  and  fan-out  out;.  Furthermore,  let 

in;  =  out;  =  0{w).  Then  conditioned  on  the  event  E,  protocol  Ilf  (k,  s)-securely  realizes  the  functionality 
Tf  against  A  in  the  (JEcrs,  J^bc,  -^Committee,  -^GenI-],  -^GENRANDf-},  modef  in  the  UC  secu¬ 

rity  framework.  The  circuit-dependent  communication  complexity  of  the  protocol  is  0(|ckt|  •  (^  +  «^)  • 
BC[\ckt\  ■  ,  n)  and  .SC(|ckt|  •  k). 

Proof  (communication  complexity):  We  analyze  each  phase  of  the  protocol: 

1 .  Input  Commitment  Stage:  Here  each  party  broadcasts  0{k)  bits  to  the  parties  in  V  and  so  the  broadcast 
complexity  of  this  step  is  BC  [nn,  n) . 

2.  [•] -sharing  of  Committed  Inputs:  Here  n  calls  to  .7^gen[.]  made.  Realizing  .Fgen[.]  with  the  protocol 

see  Lemma  5,  this  incurs  a  communication  complexity  of  and  BC  (n^K,  n). 

3.  Sub-circuit  Evaluations:  We  first  count  the  total  communication  cost  of  evaluating  the  sub-circuit 
ckt;  with  in;  input  gates  and  out;  output  gates. 

-  Converting  the  in;  [-J-shared  inputs  to  in;  (-j-shared  inputs  will  require  in;  invocations  to  the  protocol 

The  communication  complexity  of  this  step  is  0{n  •  c  •  in;  •  k)  and  BC  (n  •  c  •  in;  •  k,  n) ;  this 
follows  from  Lemma  3  by  substituting  \Ai\  =  c. 

-  Since  the  size  of  ckt;  is  at  most  evaluating  the  same  via  protocol  will  have  communication 

complexity  C>(c^(^^  +  s)k),  BC[c‘^{^-^  -|-  s)k,  c)  and  BC[c  ■  K,n);  this  follows  from  Lemma  4 
by  substituting  lAl  =  c. 

®  See  Appendix  A  for  the  meaning  of  g-hybrid  model  in  the  UC  framework. 
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Protocol  nf{V,  ckt) 

For  session  ID  sid,  every  party  Pi  GV  does  the  following: 

Initialization.  Set  C  =  P  ,n  =  \C\,  i  =  t  and  New/Com  =  1.  Divide  ckt  into  L  sub-circuits  ckti, . . . ,  ckt^,  each  of  depth 
d/L. 

CRS  Generation.  Invoke  with  (sid,  i)  and  get  back  (sid,  i,  CRS),  where  CRS  =  (pk,  ck). 

Input  Commitment.  On  input  x''^\  choose  random  polynomials  (•)>  (')  of  degree  <  t,  such  that  (0)  = 

and  compute  the  commitment  C  (i)  (i)  =  Commck(a:’'*^;  where  gig' ^  =  gr*''^(0),  =  /i^*^(0). 

bSo  ’"O 

-  Call  JTbc  with  message  (sid,  i,  C  (t)  ,(i),P)- 

'  ,90 

-  Corresponding  to  each  Pj  £  P,  receive  (sid,  i,  j,  C  , (j)  q)  )  from  Pbc- 

b9o  ."o 

[•] -sharing  of  Committed  Inputs. 

-  Act  as  a  dealer  D  and  call  with  (sid,  i,  (•),  (•),  (•)). 

-  For  every  Pj  £  P,  call  Pgen[.]  with  (sid,  i,  j,  C  , a)  (,)). 

’^0  ’  0 

-  For  every  Pj  £  P,  if  (sid,  i,j,  Failure)  is  received  from  Pgen[  ].  substitute  a  default  predefined  public  sharing  [0]  of 

0  as  set  =  [0]i  and  update  C  —  C  \  {Pj},  decrement  t  and  n  by  one.  Else  receive  (sid,  i,j, 

fromP'GEN[]- 

Start  of  While  Loop  Over  the  Suh-clrcuits.  Initialize  1  =  1.  While  I  <  L  do: 

-  Committee  Selection.  If  NewCom  =  1,  then  call  Pcommittee  with  (sid,  i,  £)  and  receive  (sid,  i,  C)  from  ^committee- 

-  [•]  to  (•)c  Conversion  of  Inputs  of  Suh-circult  ckt;.  Let  [xi], . . . ,  [xin,]  denote  the  [-(-sharing  of  the  inputs  to  ckt;: 

-  For  fc  =  l,...,in;,  participate  in  with  (sid,  i,  [xfc];,  C).  Output  (sid,  i,  {xk}i)  in  if  P;  belongs 

to  C.  Else  output  (sid,  i). 

-  Evaluation  of  the  Sub-circuit  ckt;.  If  P;  £  C  then  participate  in  P™  with  (sid,  i,  (xi);, . . . ,  (xin,  )i,  C),  else  partic¬ 

ipate  in  P^t,  with  (sid,  t, C). 

-  If  (sid,i.  Failure,  Pa,  Pb)  is  the  output  during  P^f^,  then  set  P  =  P  \  {Pa,P6},  t  =  t—  1,  n  =  n  —  2, 
NewCom  =  1  and  go  to  Committee  Selection  step. 

-  {■}c  to  [•]  conversion  of  Outputs  of  ckt;.  If  (sid,  i.  Success,  {yi}i,  ■  ■  ■ ,  {yoMi}i)  or  (sid,  i,  Success)  is  obtained  dur¬ 

ing  P^fj ,  then  participate  in  P(.)^[.]  with  (sid,  i,  {yk)i)  or  (sid,  i)  (respectively)  for  fc  =  1, . . . ,  out;. 

-  If  (sid,  i.  Success,  [t/fe]i)  is  the  output  in  P(.)^[.]  for  every  k  =  l,...,out;,  then  increment  I  and  set 
NewCom  =  0. 

-  If  (sid,  i,  Failure,  Pa,  Pb)  is  the  output  in  P(.)^[.]  for  some  k  £  {!,...,  out;},  then  set  P  =  P  \  {Pa,  Pb}, 
t  =  t—  1,  n  =  n  —  2,  NewCom  =  1  and  go  to  the  Committee  Selection  step. 

-  If  (sid,  i,  Failure,  Pa)  is  the  output  in  for  some  k  £  {1, . . . ,  out;},  then  set  P  =  P  \  {Pa},  t  =  t  —  1, 

n  =  n  —  1,  NewCom  =  1  and  go  to  the  Committee  Selection  step. 

Output  Rerandomization.  Let  [g]  denote  the  [-[-sharing  of  the  output  of  ckt.  Participate  in  PrandZero[-i  with  (sid,  i),  obtain 
(sid,i,  [0]i)  and  locally  compute  [z]i  =  [y\i  +  [0]i. 

Output  Reconstruction.  Interpret  [z]i  as  {fi,gi,  hi,  {Cf^^g^^hj}pjev)-  Initialize  a  set  7)  to  0. 

-  Send  (sid,  i,  j,  fi,gi,hi)  to  every  Pj  G  P.  On  receiving  (sid,  j,  i,  fj,gj,  hj)  from  every  party  Pj  include  party  Pj 
in  Ti  if  Cf.,g.,hj  /  Commck(/i;  (gj,  /i-i))- 

-  Interpolate  /(-)  such  that  f{aj)  =  fj  holds  for  every  Pj  £  P  \  7).  If  /(-)  has  degree  at  most  t,  output 
(sid,  i,z  =  /(O))  and  halt;  else  output  (sid,  i.  Failure)  and  halt. 


Fig.  6.  Protocol  for  UC-secure  realizing  P/ 
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-  Finally  converting  the  out/  (•)-shared  outputs  to  [-J-shared  outputs  require  out/  invocations  to  the 
protocol  This  has  communication  complexity  0{n  •  c  •  out/  •  k),  BC(outi  ■  ■  k,  n)  and 

BC[n  •  c  •  K,  n);  this  follows  from  Lemma  1  by  substituting  \  A^\  =  c. 

Thus  evaluating  ckt/  has  communication  complexity  0{{v?  +  n  •  c  •  in/  +  n  •  c  •  out/  +  +  s))k), 

BC{(yTT?  +  n-  c •  in/  +  •  out/)K, n)  and +  s)k,  c).  Assuming  in/  =  0{w)  and  out/  =  0{w), 

with  w  =  this  results  in  +  n  •  c  •  +  s))k),  BC[{'nP‘  +  n  •  c  •  and 

^C((c2-(1^  +  s))k,  c) .  The  total  number  of  sub-circuit  evaluations  is  at  most  L  +  t,  with  L  successful 
evaluations  and  at  most  t  failed  evaluations.  Substituting  L  =  f,  we  get  the  overall  communication 
complexity  C>((|ckt|  •  +  c^)  +  nH  +  •  t)K),  BC[{\ckt\  ■  +  Tn?t)K,  n)  and  BC  {{\ckt\  ■  + 

■  s  ■  t)K,c). 

4.  Output  Rerandomization  and  Reconstruction:  The  costs  C)(Poly(n,  k))  bits. 

The  circuit-dependent  complexity  of  the  whole  protocol  comes  out  to  be  C)(|ckt|  •  -|-  c^)k)  bits  of 

communication  over  the  point-to-point  channels  and  broadcast-complexity  of  BC [\ckt\  •  and 

.BC(|ckt|  •  •  K,  c) .  Since  c  has  to  be  selected  so  that  e'^  <  2~'^  holds,  asymptotically  we  can  set  c  to  be  0{k). 

(For  any  practical  purpose,  k  =  80  is  good  enough.)  It  implies  that  the  (circuit-dependent)  communication 
complexity  is  C>(|ckt|(^  -|-  k)k^),  BC [\ckt\  ■  n)  and  .BC(|ckt|K^,  k).  □ 

We  propose  two  optimizations  for  our  MFC  protocol  that  improves  its  communication  complexity. 


[•] -sharing  among  a  smaller  subset  of  V.  While  for  simplicity,  we  involve  the  entire  set  of  parties  in  V  to 
hold  [•] -shared  values  in  the  protocol,  it  is  enough  to  fix  and  involve  a  set  of  just  z  parties  that  guarantees 
a  honest  majority  with  overwhelming  probability.  From  our  analysis  in  Section  1,  we  find  fhaf  z  =  0(k). 
Indeed  if  is  easy  fo  nofe  fhaf  all  we  require  from  fhe  sef  involved  in  holding  a  [•] -sharing  is  hones!  majorify 
fhaf  can  be  affained  by  any  sef  confaining  0{k)  parties.  This  opfimizafion  replaces  n  by  k  in  fhe  complexify 
expressions  mentioned  in  Theorem  1 .  If  implies  fhaf  fhe  (circuif-dependenf)  communication  complexify  is 
0(|ckt|(^  -|-  k)k‘^),  BC (|ckt|  •  k)  and  BC (|ckt|K^,  n).  Now  insfanfiafing  fhe  broadcasf  funcfionalify  in 
fhe  above  modified  protocol  wifh  fhe  Dolev-Sfrong  (DS)  broadcasf  profocol  (  see  Appendix  B),  we  obfain 
fhe  following: 


Corollary  1.  If  d  =  uj{t)  and  if  the  calls  to  are  realized  via  the  DS  broadcast  protocol,  then  the 
circuit-dependent  communication  complexity  of  Uf  is  0(|ckt|  •  kJ). 

When  we  resfricf  fo  widfhs  w  of  fhe  form  w  =  we  can  insfanfiale  all  fhe  invocafions  to  in  the 

protocols  77^. and  (invoked  before  and  after  fhe  sub-circuif  evaluations)  by  fhe  Fifzi-Hirf  (FH) 

multi-valued  broadcasf  profocol  [19],  see  Appendix  B.  This  is  because,  selling  w  =  lo{k^)  ensures  fhaf 
fhe  combined  message  over  all  fhe  inslances  of  (respectively  TIj.]^^.^)  to  be  broadcasf  by  any  parly 

safisfies  fhe  bound  on  fhe  message  size  of  fhe  FH  protocol.  Incorporaling  fhe  above,  we  obfain  fhe  following 
corollary  wifh  heifer  resull. 

Corollary  2.  If  d  =  uj{t)  and  w  =  uj{k^)  (i.e.  |ckt|  =  uj{K^t)),  then  the  circuit-dependent  communication 
complexity  of  Uf  is  0(|ckt|  •  n^). 


Packed  Secret-Sharing.  We  can  employ  packed  secret-sharing  technique  of  [20]  to  checkpoint  multiple 
outputs  of  the  sub-circuits  together  in  a  single  [•] -sharing.  Specifically,  if  we  involve  all  the  parties  in  V  to 
hold  a  [•] -sharing,  we  can  pack  n  —  2t  values  together  in  a  single  [•] -sharing  by  setting  the  degree  of  the 
underlying  polynomials  to  n  —  f  —  1.  It  is  easy  to  note  that  robust  reconstruction  of  such  a  [•] -sharing  is  still 
possible,  as  there  are  n  —  t  honest  parties  in  the  set  V  and  exactly  n  —  t  shares  are  required  to  reconstruct  an 
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{n  —  t  —  1)  degree  polynomial.  For  every  sub-eireuit  ckt/,  the  Wouti  output  values  are  grouped  so  that  eaeh 
group  contains  n  —  2t  secrets  and  each  group  is  then  converted  to  a  single  [•] -sharing. 

If  we  restrict  to  circuits  for  which  any  circuit  wire  has  length  at  most  d/L  =  d/t  (i.e.  reaches  upto  at 
most  d/L  levels),  then  we  ensure  that  the  outputs  of  circuit  ckt^  can  only  be  the  input  to  circuit  ckt;+i.  With 
this  restriction,  the  use  of  packed  secret-sharing  becomes  applicable  at  all  stages,  and  the  communication 
complexity  becomes  0(|ckt|  •  (3  +  «^)  •  ^C(|ckt|  •  and  BC (|ckt|  •  k^,  «;) ;  i.e.  a  factor  of  n  less 

in  the  first  two  terms  compared  to  what  is  stated  in  Theorem  1 .  Realizing  the  broadcasts  using  DS  and  FH 
protocol  respectively,  we  obtain  the  following  corollaries: 

Corollary  3.  If  d  =  and  if  the  calls  to  realized  via  the  DS  broadcast  protocol,  then  the 

circuit-dependent  communication  complexity  of  Uj  is  0(|ckt|  •  n^). 

Corollary  4.  lfd  =  uj{'^)andw  =  uj{n?-{n  +  K))(i.e.  |ckt|  =  u}{^^{n-\-K))),  then  the  circuit-dependent 
communication  complexity  of  the  protocol  Uf  is  0(|ckt|  •  k^). 
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Appendices 


A  The  UC  Security  Model 

We  work  in  the  standard  Universal  Composability  (UC)  framework  of  Canetti  [7],  with  statie  eorruption.  The 
UC  framework  introduees  a  PPT  environment  Z  that  is  invoked  on  the  eomputational  seeurity  parameter  k, 
the  statistieal  seeurity  parameter  s  and  an  auxiliary  input  z  and  oversees  the  exeeution  of  a  protoeol  in  one 
of  the  two  worlds.  The  “ideal”  world  exeeution  involves  dummy  parties  Pi, ... ,  Pn,  an  ideal  adversary  S 
who  may  eorrupt  some  of  the  dummy  parties,  and  a  funetionality  T .  The  “real”  world  exeeution  involves 
the  PPT  parties  Pi,...,Pn  and  a  real  world  adversary  A  who  may  eorrupt  some  of  the  parties.  In  either  of 
these  two  worlds,  a  PPT  adversary  ean  eorrupt  t  parties  out  of  the  n  parties.  The  environment  Z  ehooses  the 
input  of  the  parties  and  may  interaet  with  the  ideal  world/real  world  adversary  during  the  exeeution.  At  the 
end  of  the  exeeution,  it  has  to  deeide  upon  and  output  whether  a  real  or  an  ideal  world  exeeution  has  taken 
plaee. 

We  let  IDEALjp  s,  z)  denote  the  random  variable  deseribing  the  output  of  the  environment  Z 

after  interaeting  with  the  ideal  exeeution  with  adversary  S,  the  funetionality  P,  on  the  eomputational 
seeurity  parameter  k,  the  statistieal  seeurity  parameter  s  and  z.  Let  IDEALyr  ^  ^  denote  the  ensemble 
{IDEALyr5_^(«:,  s,  2g{op}*.  Similarly  let  ^^kLn,A,z{F,  s,  z)  denote  the  random  variable  de¬ 

seribing  the  output  of  the  environment  Z  after  interaeting  in  a  real  exeeution  of  a  protoeol  If  with  adversary 
A,  the  parties  V,  on  the  eomputational  seeurity  parameter  n,  the  statistieal  seeurity  parameter  s  and  z.  Let 
REALi7^_4^^  denote  the  ensemble  {REAL77^_4^^(k,  s,  2:)}K,seN,^e{o,i}*- 
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Definition  4.  For  n  G  N,  let  T  be  an  n-ary  functionality  and  let  IJ  be  an  n-party  protocol.  We  say  that 
n  (k,  s)-securely  realizes  T  in  the  UC  security  framework,  if  for  every  PPT  real  world  adversary  A,  there 
exists  a  PPT  ideal  world  adversary  S,  corrupting  the  same  parties,  such  that  the  following  two  distributions 
are  computationally  indistinguishable  in  n,  with  all  but  probability: 

IDEALjr  ^  ^  PS  REALij  ^^^. 

We  consider  the  above  definition  where  it  quantifies  over  differenf  adversaries:  passive  or  acfive,  fhaf  corrupfs 
only  cerfain  number  of  parfies.  Nofe  fhaf  fhe  securify  offered  by  fhe  sfafisfical  securify  paramefer  s  does  nof 
depend  upon  fhe  compufafional  power  of  fhe  adversary. 

Modular  Composition:  A  great  advantage  of  the  UC  model  is  that  it  allows  to  prove  the  security  of  the 
protocols  in  a  modular  fashion.  Specifically,  the  sequential  modular  composition  theorem  [7]  states  that  in 
order  to  analyze  the  security  of  a  protocol  vry  for  computing  a  function  /  that  uses  a  subprotocol  -Kg  for 
computing  another  function  g,  it  suffices  to  consider  the  execution  of  ttj  in  a  model  where  a  trusted  third 
party  is  used  to  ideally  compute  g  (instead  of  the  parties  running  the  real  subprotocol  -Kg).  This  facilitates 
a  modular  analysis  of  security:  we  first  prove  the  security  of  vr^  (as  per  the  UC  definition)  and  then  prove 
the  security  of  ttj  assuming  an  ideal  party  (functionality)  for  g.  This  model  in  which  ttj  is  analyzed  using 
ideal  calls  to  g,  instead  of  executing  TTg,  is  called  the  g-hybrid  model  because  it  involves  both  a  real  protocol 
execution  (for  computing  /)  and  an  ideal  trusted  third  party  computing  g. 

B  UC-secure  Instantiation  of  Various  Functionalities 
B,1  Protocol  for  Realizing  :Fgen[  ] 

We  design  a  protocol  TIj.],  presented  in  Eigure  7,  for  realizing  the  functionality  .FgenI  ]  the  UC  framework. 
We  closely  follow  the  standard  Pedersen  VSS  scheme  [27]  against  a  threshold  static  adversary.  Specifically, 
let  C  be  the  existing  commitment  available  to  the  parties  in  V  such  that  C  =  Commck{s]  g,h)  and  let 
(s,  g,  h)  be  available  to  D.  To  [-J-share  s,  the  dealer  D  selects  three  random  polynomials  f{-),g{-)  and  hf) 
each  of  degree  at  most  t  such  that  /(O)  =  s,  (/(O)  =  g  and  /i(0)  =  h.  To  every  party  Pi  in  V,  D  distributes 
the  share  f  =  /(a*)  and  opening  information  gi  =  g{ai)  and  hi  =  h{ai).  Additionally,  D  publicly  commits 
to  the  shares  of  all  the  share-holders,  with  the  corresponding  opening  information  acting  as  the  randomness 
for  the  commitments.  Namely  D  broadcasts  {C fj,gj,hj } Pj&P  via  Pbc- 

Every  honest  party  Pi  then  verities  three  conditions:  (1).  if  the  commitments  correspond  to  polynomials 
of  degree  at  most  t  (2).  if  the  commitments  are  consistent  with  C  in  the  sense  that  the  constant  terms  of  the 
polynomials  committed  via  the  commitments  {C fj^g^^hj}pj&p  are  indeed  embedded  in  C  (3).  if  fi,gi,  hi 
received  over  the  point-to-point  channel  is  consistent  with  C f^^gi^hi  received  via  Pbc-  The  first  two  tests  can 
be  done  appealing  to  the  homomorphic  property  of  the  commitment  scheme.  If  any  of  the  first  two  tests  fails, 
then  Pi  concludes  that  D  is  corrupted  and  outputs  Failure.  If  the  last  test  fails  (but  first  two  tests  succeed), 
then  Pi  complains  D  (publicly)  who  resolves  the  complain  by  revealing  fi,gi,hi  via  Pbc-  Subsequently, 
the  third  test  is  checked  with  fi,gi,  hi  received  from  D  publicly.  If  the  test  is  successful.  Pi  accepts  the  new 
fi,gi,  hi  and  outputs  [s]i  =  {fi,gi,hi,  {C f^^g^^hj}Pj&p)-  Else  Pi  outputs  Failure. 

Intuitively  the  privacy  of  the  shared  secret  s  for  an  honest  D  follows  from  the  fact  that  A  may  learn  at 
most  t  shares,  which  constitute  t  distinct  points  on  /(•)  having  degree  t;  so  from  adversary’s  point  of  view, 
we  have  one  “degree  of  freedom”;  i.e.  for  every  possible  choice  of  s,  there  exists  a  unique  /(•)  polynomial 
of  degree  t,  which  is  consistent  with  the  shares  received  by  A.  Note  that  the  publicly  known  commitment 
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of  the  shares  do  not  provide  any  additional  information  about  the  unknown  shares  to  A,  thanks  to  the 
(statistieal)  hiding  property  of  the  eommitment  seheme  and  the  faet  that  the  eorresponding  randomness  lie 
on  polynomials  of  degree  at  most  t  and  A  will  be  provided  with  at  most  t  points  on  them,  again  implying 
one  degree  of  freedom. 

Protocol  il[.] 

The  public  input  to  the  protocol  is  a  publicly  known  commitment  C  available  to  the  parties  in  V,  while  the  private  input  for 
D  is  a  secret  s  and  randomness  pair  {g,  h),  such  that  C  =  Commck(s;  g,  h)  holds.  For  session  ID  sid,  D  and  the  parties  in 
V  do  the  following: 

Round  1  (Share  Distribution  and  Broadcasting  Commitments)  —  D  does  the  following: 

-  Select  three  random  polynomials  f{-),g{-)  and  h{-)  of  degree  at  most  t,  subject  to  the  condition  that  /(O)  = 
s,  g{0)  =  g  and  h{0)  =  h. 

-  Corresponding  to  every  Pi  £  P,  compute  the  i/iare /i  =  f  [oi)  and  the  opening  information  gi  =  g{ai),hi  =  h{ai) 

and  the  commitment  =  Commck(/i;  ft,  hi). 

-  Corresponding  to  every  Pi  £  V,  send  (sid,  i,  (/i,  ft,  hi))  to  the  party  Pi.  In  addition,  call  Pbc  with 
(sid,  D,  {Cf^^g^^hj}pjev,P)- 

Round  2  (Consistency  Verification  and  Complaints)  —  Every  party  Pi  €  P  does  the  following: 

-  Receive  (sid,i,  D,  {fi,gi,hi))  horn  D  and  (sid,i,  D,  {Cf.^g.^hAPjev)  frompBC- 

-  Using  the  homomorphic  property  of  commitments,  verify 

•  if  there  exists  polynomials  of  degree  at  most  t,  say  /'{■),  g'{-)  and  h'(-)  such  that  Cf^^g.^hj  is 
Commck(/'(aj);  fl'(Qi),  h'(aj))  “  for  every  Pj  £  V. 

•  whether  the  C  is  same  as  Commck(/^(0);  5^(0),  h'(0)). 

If  any  of  the  above  tests  fail  then  output  (sid,  i,  Failure)  and  halt. 

? 

-  Verify  whether  =  Commck(/i;  ft,  hi).  If  the  verification  fails  then  call  Pbc  with 

(sid,  i,  (Unhappy,  D),  P). 

Local  Computation  (at  the  end  of  Round  2)  —  Every  party  Pi  in  P  does  the  following: 

-  Construct  a  set  Wi  initialized  to  0  and  add  Pj  £  P  to  Wi  if  corresponding  to  party  Pj  the  message 
(sid,  i,j,  (Unhappy,  D))  is  received  from*  Pbc- 

-  If  I  Wil  >  t,  then  output  (sid,  i.  Failure)  and  halt. 

Round  3  (Resolving  Complaints)  —  D  does  the  following: 

-  Corresponding  to  each  Pi  £  Wo,  call  Pbc  with  the  message  (sid,  D,  (Resolve,  Pi, /i,  ft,  hi),  P). 

Local  Computation  (at  the  end  of  Round  3)  —  every  party  Pi  £  P  does  the  following: 

-  If  there  exists  a  Pfc  £  >Vi  corresponding  to  which  the  message  (sid,  i,  D,  (Resolve,  Pk,  fk,gk,  hk))  is  received 
from  Pbc  such  that  Cf^,g,.,hi.  7^  Commck(/fc;  PkAk),  then  output  (sid,  i,  Failure)  and  halt. 

-  Else  output  [s]i  computed  as  follows  and  halt: 

•  If  Pi  £  P  \  IVi,  then  [s]i  =  (/i,ft,  hi,  {Cf-,g-,hj}Pjev)  where  /i,ft,  hi  is  received  from  D  in  Round  1. 

•  IfPi  £  Wi,then[s]i  =  (/i,  ft,  hi,  }p^6-p)  where /i,  ft,  hi  is  received  from  D  in  Round  3. 

“  This  is  done  using  a  standard  procedure  based  on  the  properties  of  Vandermonde  matrix;  see  for  example  [21]. 

*  The  contents  of  Wi  will  be  the  same  for  each  honest  party  Pi  in  P. 


Fig.  7.  Protocol  for  UC-secure  realizing  Pgen[  ] 

The  properties  of  the  protoeol  i7[.]  are  formally  stated  in  Lemma  5. 

Lemma  5.  Let  D  G  V  be  a  dealer  with  secret  s  and  randomness  pair  {g,  h)  and  let  C  be  a  publicly  known 
commitment  available  to  the  parties  in  V.  Then  the  protocol  7T[.]  UC-securely  realizes  the  functionality 
•^Gen[  ]  the  ipBC-bybrid  model.  The  protocol  has  communication  complexity  0{nK)  bits  and  BC  (riK,  n). 

Proof:  The  eommunieation  eomplexity  follows  easily  from  the  protoeol.  We  next  prove  the  seeurity,  eon- 
sidering  the  following  two  eases. 

Case  I  —  When  D  is  honest:  We  first  elaim  that  in  this  ease,  no  honest  party  will  output  Failure;  this 
easily  follows  from  the  faet  that  an  honest  D  will  distribute  eonsistent  shares  and  only  the  eorrupted  share- 
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holders  (at  most  t)  will  accuse  D  and  such  accusations  will  be  resolved  correctly  by  D.  Let  T  C  V  he  the 
set  of  parties  under  the  control  of  A  during  the  protocol  ilj.];  we  present  a  simulator  5™  (interacting  with 
the  functionality  ^gen[  ])  for  in  Figure  8.  The  high  level  idea  behind  the  simulator  is  the  following:  the 
simulator  interacts  with  ^gen[  ]  nnd  obtains  the  shares  and  opening  information  of  the  corrupted  parties, 
along  with  all  the  committed  shares  and  sends  the  same  to  the  real-world  adversary;  the  simulator  then  sim¬ 
ulates  the  rest  of  the  protocol  steps  of  TIj.]  on  the  behalf  of  the  honest  parties  (including  D).  Any  accusation 
by  a  (corrupted)  share-holder  can  be  easily  resolved  by  the  simulator,  as  it  knows  the  corresponding  share 
and  opening  information  (as  obtained  from  the  functionality),  which  it  can  reveal.  It  follows  easily  that  the 
simulated  view  has  exactly  the  same  distribution  as  the  view  of  the  real-world  adversary  in  ilj.]. 

Simulator  5™ 

The  simulator  plays  the  role  of  the  honest  parties  (including  D)  and  simulates  each  step  of  the  protocol  77[.]  as  follows.  The 
communication  of  the  Z  with  the  adversary  A  is  handled  as  follows:  Every  input  value  received  hy  the  simulator  from  Z  is 
written  on  A’s  input  tape.  Likewise,  every  output  value  written  hy  A  on  its  output  tape  is  copied  to  the  simulator’s  output  tape 
(to  he  read  by  the  environment  Z).  The  simulator  then  does  the  following  for  the  session  ID  sid: 

-  Interact  with  iAsENi-]  and  obtain  (sid,  i,  (A,  g;,  hi,  }p^-g7p))  corresponding  to  every  corrupted  party  P;  G  T. 

-  On  the  behalf  of  D,  send  {s]d,  i,  {fi,  gi,  hi))  1°  A,  corresponding  to  every  Pi  G  T.  In  addition,  send 
(sid,  D,  i,  {Cf^  ^g.  }pj  ev)  to  A  on  the  behalf  of  Pbc,  corresponding  to  each  Pi  G  T. 

-  On  receiving  (sid,  i,  P,  (Unhappy,  D))  as  the  message  to  Pbc  from  A  on  the  behalf  of  any  Pi  G  T,  send 
(sid,  D,  i,  (Resolve,  Pi,  fi,gi,  hi))  to  A  as  the  message  from  Pbc  on  the  behalf  of  D. 

The  simulator  then  outputs  A’s  output  and  terminate. 


Fig.  8.  Simulator  for  the  adversary  A  corrupting  at  most  t  parties  in  the  set  T  C  P  \  D  in  the  protocol  77[.] . 


Case  2  —  When  D  is  Corrupted:  We  first  note  that  there  exists  at  least  t  +  1  honest  parties  in  V  and  that 
there  exists  only  a  unique  polynomial  of  degree  at  most  t  passing  through  a  set  of  f  -|-  1  or  more  distinct 
points.  With  these  facts,  we  next  prove  the  security  with  respect  to  a  corrupted  D.  Let  T  C  V  he  the  set  of 
parties  under  the  control  of  A  including  D,  during  the  protocol  77[.] ;  we  present  a  simulator  (interacting 
with  the  functionality  .7^gen[  ])  for  in  Figure  9. 

It  follows  easily  that  the  simulated  view  is  computationally  indistinguishable  from  the  view  of  the  real- 
world  adversary;  otherwise  we  can  use  the  corresponding  distinguisher  to  break  the  binding  property  of  the 
underlying  commitment  scheme.  □ 
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Simulator  5™ 

The  simulator  plays  the  role  of  the  honest  parties  and  simulates  each  step  of  the  protocol  JI[.]  as  follows.  The  communication 
of  the  Z  with  the  adversary  A  is  handled  as  follows:  Every  input  value  received  by  the  simulator  from  Z  is  written  on  ^’s 
input  tape.  Likewise,  every  output  value  written  by  A  on  its  output  tape  is  copied  to  the  simulator’s  output  tape  (to  be  read  by 
the  environment  Z).  The  simulator  then  does  the  following  for  the  session  ID  sid: 

-  Play  the  role  of  n  —  |r|  honest  parties  and  interact  with  A  as  per  the  protocol  7T[.] . 

-  If  Failure  is  obtained  during  the  simulated  execution  of  the  protocol  due  to  the  fact  that  the  committed  shares  and  the 

corresponding  randomness  do  not  lie  on  polynomials  of  degree  at  most  t,  then  send  three  arbitrary  polynomials  of  degree 
more  than  t  on  the  behalf  of  D  to  the  functionality  . 

-  Else  define  three  polynomials  f{-),g{-)  and  h{-)  of  degree  at  most  t,  such  that  /(oi)  =  fi,g{<^i)  ~  gt  and  h{ai)  = 
hi  holds  for  every  honest  party  Pi  ^  T,  where  /;  and  {gi,  hi)  are  the  corresponding  share  and  opening  information 
respectively  which  are  obtained  by  Pi  during  the  simulated  run  of"  of  Then  send  the  polynomials  /{■),§{■)  and 
h{-)  on  the  behalf  of  D  to 

The  simulator  then  outputs  ^’s  output  and  terminate. 


"  Note  that  /{■),§{■)  and  h{-)  are  well  dehned  as  there  exists  |n|  —  \T\  >  t  +  1  honest  parties  in  V. 

Fig.  9.  Simulator  for  the  adversary  A  corrupting  at  most  t  parties  in  the  set  T  C  P  including  D  during  the  protocol  il[.] . 

B.2  Protocols  for  Realizing  (Fcommittee  and  (Fbc 

The  Committee  Selection  Protocol:  Functionality  .^committee  can  be  realized  using  various  standard 
ways;  moreover,  the  functionality  will  be  invoked  at  most  (t  +  1)  times  in  our  MPC  protocol;  t  times 
corresponding  to  t  failed  sub-circuit  evaluations  plus  once  for  initial  selection  of  a  committee.  As  this  cost 
is  independent  of  the  circuit  size  |ckt|  (but  rather  Poly(n)),  we  give  only  a  high  level  sketch  of  one  of  the 
possible  instantiations  of  .^committee,  based  on  a  computationally  secure  pseudo-random  number  generator 
(PRNG)  [29].  Assume  we  have  a  PRNG  Tlk{')  with  seed  k,  which  outputs  values  in  the  range  1, . . . ,  n. 
Then  each  time  a  committee  needs  to  be  formed,  the  parties  in  V  can  agree  on  a  random  seed  /c;  this  can 
be  done  via  standard  method,  say  by  coin-flipping  (or  executing  an  instance  of  ilj.j  on  the  behalf  of  each 
party).  Then  the  parties  can  (locally)  run  TZ  with  the  obtained  key,  till  they  obtain  the  desired  committee.  It 
follows  via  the  security  of  TZ,  that  the  committee  selected  like  this  is  indeed  a  uniformly  random  committee 
of  parties  with  high  probability.  We  can  simplify  further  by  putting  up  a  random  seed  in  the  CRS,  rather  than 
sampling  a  random  seed  on  the  fly  every  time  a  committee  needs  to  be  formed. 

The  Broadcast  Protocol:  Assuming  a  PKI  set-up,  the  well  known  Dolev-Strong  (DS)  broadcast  protocol 
[16]  allows  a  sender  Sen  G  "P  to  reliably  broadcast  a  message  m  of  size  f  to  a  set  of  parties  A  C  P,  provided 
A  has  at  least  one  honest  party;  the  protocol  can  be  used  to  realize  Pbc-  As  stated  in  [19],  using  the  DS 
protocol,  it  costs  the  parties  in  A  U  {Sen}  a  total  communication  of  OdAp  ■  i  ■  k)  bits  over  the  point-to- 
point  channels  to  enable  the  Sen  to  broadcast  m  to  the  parties  in  X.  As  the  protocol  is  well  known  in  the 
literature,  we  avoid  giving  the  details  here  and  instead  refer  the  interested  readers  to  [18]  for  the  details.  We 
also  note  that  [19]  suggests  an  improved  proposal  for  realizing  Pbc  with  a  communication  complexity  of 
0(1^1  •  i  +  \X\^  ■  {\X\  -F  k))  bits,  but  with  a  restriction  on  the  size  of  £,  namely  £  =  w(|  •  {\X\  -F  k)).  We 

make  use  of  this  proposal  for  estimating  the  communication  complexity  of  our  MPC  protocol  in  Section  3.3. 

C  Supporting  Sub-Protocols 

In  this  appendix,  we  present  the  details  for  the  sub-protocols  which  enable  a  number  of  tasks  such  as  con¬ 
version  from  [-J-sharing  to  (^-sharing  and  vice-versa  and  generating  a  random  [-j-sharing  of  0. 
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C.l  Protocol  for  Converting  a  (•)-sharing  to  a  [-J-sharing 

Protocol  is  presented  in  Figure  10. 


Protocol 

For  session  id  sid,  every  party  Pi  GV  participates  with  either  (sid,  i,  {s}i)  or  (sid,  i)  and  does  the  following: 

-  If  Pj  e  X,  interpret  {s)i  as  {si,Ui,Vi,{Cf^^^.^^.}pjex)  and  invoke  Pbc  with  {s\6,i,  {Cf^^^.  ,^.}p.ex ,P) 

-  Receive  {s\d,i,  k,  Pj^x)  fromPec  for  every  Pfc  £  X  (who  acted  as  the  sender). 

-  If  there  exists  a  pair  of  parties  Pa,Pb  &  X,  such  that  7^  {Csj,uj,vj}pjex,  then  output  (sid,i, 

Failure,  Pa,Pb)  and  halt;  if  there  are  multiple  such  pairs  (Pa,  p,)  the  select  the  one  with  the  least  index  a  and  b.  Else  set 

Sj  ,Uj  Pj  (^X  -  g  A-  to  be  the  reference  set  of  commitments,  where  P^  is  the  least  indexed  party  in  P. 

-  If  Pi  £  X,  act  as  a  D  and  call  Pgen[1  with  (sid,  i,  h*'*^(-))  where  and  are  random 

polynomials  of  degree  at  most  f,  subject  to  the  condition  that (0)  =  Si,p^*^(0)  =  Ui  and/i^*^(0)  =  Ui.IfPi  £  V\X, 
invoke  Pgen[  ]  with  (sid,  i,  fc,  for  every  P*,  £  X,  where  Cs^.^uk,vk  is  obtained  from  the  reference  set  of 

commitments.  Receive  (sid,  i,  k,  Failure)  or  (sid,  i,  k,  [sfe]i)  from  Pgen[  ]  for  every  Pk  £  X 

-  Output  (sid,  i.  Failure,  Pfc)  and  halt  if  (sid,  i,  Pfc,  Failure)  is  received  from  Pgen[  ]  corresponding  to  any  Pk  £  X. 

Otherwise,  locally  compute  [s]i  =  output  (sid,  i,  Success,  [s]i)  and  halt. 


Fig.  10.  Protocol  for  Converting  (•)-sharing  to  [-J-sharing  in  the  (Pbc,  Pgen[1  (-hybrid  Model 


C.2  Protocol  for  Generating  (•) -sharing  of  a  Committed  Secret 

Protocol  is  presented  in  Figure  11. 
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Protocol  il(.) 

For  session  ID  sid,  every  Pi  £  V  participates  with  (sid,  i,D,C f^g^h,  X)  where  is  a  (publicly  known)  commitment. 

The  dealer  D  participates  with  (sid,  D,  /,  g,  h,  X)  where  /  is  the  secret  and  [g,  h)  is  the  randomness  pair,  such  that  C  = 
Commck(/;  g,  h).  The  parties  in  V  do  the  following: 

Round  1  (Share  Distribution  and  Broadcasting  Commitments):  Only  D  does  the  following: 

-  Corresponding  to  every  Pi  £  X,  select  a  random  share  Si  and  a  random  pair  of  opening  information  Ui,Vi,  subject  to 

the  condition  that  ~  />  ex  ~  9  g^:  ~  compute  the  commitment  Csi,ui,vi  ~ 

Commck(si;  Ui,Vi).  Send  (sid,  i,  D,  Si,  Ui,  Vi)  to  the  party  Pi. 

-  Call  with  (sid,  D,  {Csj.uj.vj} Pjex ,V)  to  broadcast  Pjex  to  all  the  parties  in  V. 

Round  2  (Consistency  Verification  and  Complaints):  Every  party  Pi  £  P  does  the  following: 

-  Receive  (sid,  i,  D,  {Csj.uj.vj  }pjex)  from  Pbc-  Additionally  if  Pi  £  X,  then  receive  (sid,  i,  D,  Si,  Ui,Vi)  from  D. 

7 

-  Verify  if  ©pgA’Cs  .  =  Cf.g.h  (homomorphically).  If  the  verification  fails,  then  output  (sid,  i,  D,  Failure)  and 
halt. 

7 

-  If  Pi  £  X  then  verify  whether  Cs^.ui.vi  =  Commck(si;  Ui,  vf).  If  the  verification  fails  then  call  Pbc  with  (sid,  i, 
(Unhappy,  i,  D),?^). 

-  Construct  a  set  Wi  initialized  to  0  and  add  Pj  £  X  to  Wi  if  (sid,  i,  j,  (Unhappy,  j,  D))  is  received  from"  Pbc  cor¬ 
responding  to  Pj . 

Round  3  (Resolving  Complaints):  Only  D  does  the  following: 

-  Corresponding  to  each  Pi  £  Wo,  call  Pbc  with  the  message  (sid,  D,  (Resolve,  i,  Si,  Ui,Vi),P). 

Local  Computation  (at  the  end  of  Round  3):  Every  party  Pi  £  P  does  the  following: 

-  If  there  exists  aPk  £  Wi  corresponding  to  which  the  message  (sid,  i,  D,  (Resolve,  k,  Sk,Uk,Vk))  is  received  from 
Pbc  such  that  Csf.,uk,vk  7^  Commck(sfc;  Wfc,  Vk),  then  output  (sid,  i,  D,  Failure)  and  halt. 

-  Else  every  Pi  £  P  \  X  outputs  (sid,  i,  D,  Success)  and  halts,  while  every  Pi  £  X  does  the  following: 

•  If  Pi  e  V  \  Wi,  then  set  (/)i  =  (si,  Ui,  Vi,  jp^gic),  where  (si.  Mi,  Wi)  was  received  from  D  and 

ex  was  received  from  Pbc  at  the  end  of  Round  1.  Output  (sid,  i,  D,  Success,  (/)i)  and  halt. 

•  Else  if  Pi  £  Wi,  then  set  (/)i  =  {si,Ui,  Vi,  {Cs^.uj.vj  }pjex).  where  (si,  Ui,  Vi)  was  received  from  Pbc  (cor¬ 
responding  to  D)  at  the  end  of  Ronnd  3  and  {C*  }p.  g^-  was  received  from  Pbc  at  the  end  of  Round  1. 

Output  (sid,  i,  Success,  D,  {/)i)  and  halt. 

“  The  contents  of  Wi  will  be  the  same  for  each  honest  party  Pi  in  P. 


Fig.  11.  Protocol  for  Verifiably  (-(-sharing  an  Existing  Committed  Secret 
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C.3  Protocol  for  Transforming  [•] -sharing  to  (•)-sharing 

Protocol  is  presented  in  Figure  12. 


Protocol 

For  session  ID  sid,  every  party  Pi  £  V  participates  in  the  protocol  with  (sid,  i,  [s]i,  A"),  where  [s]i  = 
ht,  }pjer>)  and  does  the  following: 

Veriflably  (•)-sharing  the  Share  and  Opening  Information  in  [s]i.  Act  as  a  dealer  D  and  participate  in  an  instance  of  JI(.) 
with  input  (sid,  i,  fi,gi,  hi,  X).  For  every  Pk  £  P,  participate  in  the  instance  of  corresponding  to  the  dealer  Pu 
with  input  (sid,  i,  k,  Cf^,g^,hk ,  X). 

Identifying  the  Correctly  (  (-shared  Shares  of  s  and  Generating  {s)x.  If  Pi  €  P  \  X,  then  output  (sid,  i)  and  halt.  Else 
construct  a  set  Ti.  initialized  to  0. 

-  Include  Pk  £  "P  to  if  (sid,  i,  k,  Success,  {fk)i)  is  the  output  for  the  instance  of  77^,)  where  Pk  acted  as  the  dealer. 

-  Without  loss  of  generality,  let"  Ti.  =  {Pi , . . . ,  Pi-wi }  and  let  ci , . . . ,  C|h|  be  the  publicly  known  Lagrange  interpola¬ 
tion  coefficients,  such  that  ci/i  +  . . .  +  c\-H\f\n\  =  s.  Then  locally  compute  {s)i  =  ci(/i)i  +  . . .  +  C|H|(/|H|)i> 
output  (sid,  i,  {s}i)  and  halt. 

“  The  set  Ti  will  be  of  size  more  than  t  +  1. 


Fig.  12.  Protocol  for  Converting  an  [-J-sharing  to  (-(-sharing. 


C.4  Protocol  for  Generating  Random  [•] -sharing  of  0 

As  mentioned  earlier,  the  protoeol  uses  the  ideal  ZK  funetionality  .Fzk.bc  presented  in  Figure  13. 

Functionality  Pzk.bc 

PzK.BC  interacts  with  a  prover  Pj  £  P  and  the  set  of  n  verifiers  P  —  {Pi, . . . ,  Pn}  and  the  adversary  S  and  is  parameterized 
by  the  commitment  key  ck  of  a  double-trapdoor  homomorphic  commitment  scheme. 

-  Upon  receiving  (sid, y,  C,u,v)  from  the  prover  Pj  and  (sid,y,  i)  from  every  party  Pi  £  P  \  {Pj},  the  functionality 
sends  (sid,  i,  C)  to  every  party  Pi  £  P  and  <S  and  halts  if  C  =  Commck(0;  u,  v)  is  true.  Else  the  functionality  sends 
(sid,  i,  _L)  to  every  party  Pi  £P  and  S  and  halts. 


Fig.  13.  The  Ideal  Functionality  for  ZK  Proof  of  Committing  Zero 

Now  based  on  the  funetionalities  .?^gen[-]  {Fzk.bc^  protoeol  FTrandZeroI-]  is  presented  in  Figure  14. 
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Protocol  -/7randZero[-] 

For  the  session  id  sid,  every  party  Pi  GV  participates  with  (sid,  i)  and  does  the  following: 

Publicly  Committing  0: 

-  Set  Ti  =  0  and  randomly  select  Ui,Vi  £  Fp  and  compute  Cr^,ui,vi  =  Commck(ri;  Ui,  Vi). 

-  Act  as  aprover  and  call  JFzk.bc  with  (sid,  i,  Cri,ui,vi  ,Ui,Vi).  Corresponding  to  every  prover  Pj  £  P\Pi,  participate 
in  JFzk.bc  with  (sid,  j,  i). 

-  Construct  a  set  %,  initialized  to  0  and  include  Pj  in  %  if  corresponding  to  the  prover  Pj,  (sid,f,  Cr-,u  -,v  )  is 
received  from  JFzk.bc- 

[  (-sharing  0: 

-  Select  three  random  polynomials  and  each  of  degree  at  most  t,  subject  to  the  condition  that 

/«(0)  =  ri,fl«(0)  =  Ui  and /i«(0)  =  Wi. 

-  Act  as  a  D  and  call  ]  with  (sid.  Pi,  (•)>  (Oi  ('))•  Corresponding  to  every  dealer  Pj  £  Ti,  participate 

in  JFgen[  ]  with  (sid,  i,  j,  ). 

-  If  corresponding  to  any  Pj  £  %,  (sid,  i,  Pj,  Failure)  is  received  from  then  remove  Pj  from  7). 

-  Locally  compute  [0]i  =  gr  where  (sid,i,y,  [r^ji)  is  received  from  JFgen[  ]  corresponding  to  Pj  £  7). 
Output  (sid,  i,  [Oji)  and  halt. 


Fig.  14.  Protocol  7fRANDZERo[  ]  for  generating  a  random  [-[-sharing  of  0. 


D  Protocol  for  (•) -shared  Evaluation  of  a  Circuit 

As  evident  from  the  high  level  deseription  of  protoeol  in  Seetion  3.2,  the  major  step  of  the  protoeol 
77^^  is  the  preparation  stage  for  generating  the  shared-triplets.  Towards  eonstrueting  the  preparation  stage 
protoeol  and  protoeol  Uq^,  we  begin  with  the  building  bloeks  and  sub-protoeols  most  of  whieh  are  taken 
from  [12]  and  rest  are  modified  aeeording  to  our  need.  Many  of  the  sub-protoeols  are  deseribed  with  respeet 
to  a  set  of  parties  X  C  V,  where  we  assume  that  X  eontains  at  least  one  honest  party. 

Strong  Semi-honest  Secure  Two-party  Multiplication  Protocol.  Protoeol  TImultC®,  b)  (ci,  C2)  is  a 

two-party  protoeol.  The  inputs  of  the  first  and  seeond  party  are  a  and  b  respeetively.  The  outputs  to  the  first 
and  seeond  party  are  ci  and  C2  respeetively.  It  holds  that  ci  is  random  in  Fp  and  ci  +  C2  =  a  •  6.  Informally 
the  protoeol  satisfies  fhe  following  properfies  (for  fhe  eomplefe  formal  defails  see  [12]): 

-  The  profoeol  is  seeure  even  if  fhe  adversary  malieiously  ehooses  fhe  randomness  for  fhe  eorrupfed  parlies 
(Ihis  is  fhe  reason  [12]  ealls  fhe  profoeol  as  strong  semi-honest  seeure). 

-  The  view  of  fhe  profoeol  eommifs  fhe  adversary  fo  his  randomness  and  given  fhe  view  and  fhe  random¬ 
ness  if  is  possible  fo  verify  whelher  any  parfy  deviated  from  fhe  profoeol. 

In  our  eonfexf,  fhe  seeond  properly  of  TImult  is  very  erueial,  as  if  enables  an  honesl  parly  involved  in  TImult 
fo  idenlify  any  malieious  behavior  of  ils  parlner  in  fhe  profoeol  when  fhe  individual  randomness  are  revealed. 
There  are  various  slandard  ways  for  inslanlialing  TImultCo,  b),  based  on  variely  of  sfandard  assumptions, 
sueh  as  homomorphie  eneryplion,  oblivious  Iransfer  (OT),  ele.  An  inslanlialion  based  on  Paillier  eneryplion 
wilh  communication  complexity  0{k)  is  provided  in  [12]  (for  details  see  [12]). 

Semi-honest  Secure  Triple  Generation  Protocol.  The  protocol  TItriple  (see  Figure  15)  uses  the  two 
party  protocol  TImult  as  a  sub-protocol  and  allows  a  set  of  parties  X  <Z  V  to  generate  one  (•) -shared 
multiplication  triplet  {{a)x,  {b)x,  {c)x)-  The  protocol  is  executed  assuming  semi-honest  adversary.  The 
protocol  is  based  on  the  following  idea:  every  party  Pi  ^  X  selects  a  random  Oj  and  bi  and  commits 
the  same.  Then  we  set  a  and  b  to  be  the  sum  of  all  a^s  and  6jS.  For  setting  c  as  a  ■  b,  every  pair  of  parties 
Pi ,Pj^X  need  to  securely  compute  the  “cross-terms”  Oj  •  bj  and  aj  •  bi,  for  which  they  execute  two  instances 
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of  TImult-  Once  Pi  computes  its  Ci,  it  publicly  commits  the  same.  Instantiating  the  calls  to  TImult  with  that 
of  [12]  (based  on  the  Paillier  encryption),  protocol  TItriple  has  communication  complexity  of  0{\X\‘^k) 
SLndBC{\X\K,\X\). 


Protocol  ilrRiPLE 

The  public  input  to  the  protocol  is  a  set  of  parties  X  C  P  containing  at  least  one  honest  party.  For  the  session  id  sid,  every 
party  Pi  £  X  participates  with  (sid,  i)  and  does  the  following: 

-  Randomly  select  shares  Oi, 

-  For  all  Pj  £  X  \  Pi,  run  /lMULT(ai,  hj)  ^  eji)  as  party  1. 

-  For  all  Pj  £  X  \  Pi,  run  ilMULT(aj,  hi)  ^  (dji,  Cij)  as  party  2. 

-  Set  Ci  =  tti  ■  bi  +  dij  +  ^p^^;^\^p.  Cij. 

-  Randomly  select  qi,ri,  Si,ti,Ui  and  Vi  and  compute  the  commitments  =  Com'mck{ai;qi,ri),Ci,-,si,ti  = 

Commck(hi;  Si,  fi)  and  Cc^ =  Commck(ci;  Mi,  Wi).  Call  Pbc  with  (sid,  i,  ,  ff). 

-  Corresponding  to  each  Pj  e  X,rsceivs{s\d,i,j,Capqprj,Cbpsj,tj,Ccpupvj)irompBC- 

-  Set  (a)i  =  {ai,q,,ri,{Capqprj}pjex),{b)i  =  {bi,  Si,ti,  {Cbpspi^jp^ex)  and  (c)i  = 
(ci,  Mi,  Vi,  {Ccpuj,vj}pjex)-  Output  (sid,i,  (a)i,  (6)i,  (c)i)  and  halt. 


Fig.  15.  Protocol  for  Generating  One  (•)-shared  Multiplication  Triple  Assuming  No  Active  Corruptions. 


Functionality  for  Generating  (•)-sharing  of  a  Random  Value.  Functionality  ./^genrand{  )  (presented  in 
Figure  16)  generates  an  (•)-shared  random  value  within  a  designated  set  of  parties  X  C  V,  where  each  party 
in  X  “contributes”  its  “part”  of  the  share  and  opening  information  for  the  shared  random  value. 

Functionality  Pgenrand(  > 

The  functionality  interacts  with  a  designated  set  of  parties  X  gP  containing  at  least  one  honest  party  and  the  adversary  S  and 
is  parametrized  by  the  commitment  key  ck  of  a  double-trapdoor  commitment  scheme.  For  the  session  id  sid,  the  functionality 
does  the  following: 

-  On  receiving  (sid,  i,  Si,  Ui,Vi)  from  each  party  Pi  £  X,  compute  s  =  511  p  g a-  ^  =  "^p  ex  ~  ^p  ex 

and  Cs,u,v  ~  Commck(s;  M,  m).  In  addition,  for  each  Pi  £  X,  compute  Cs^,ui,vi  =  Commck(si;  Mi,  Mi).  Finally  send 
(sid,i,  Cs,u,v,{Cspuj,vj}pjex)  to  every  Pi  e  A"  and  halt. 


Fig.  16.  Functionality  for  Generating  (•)-shared  Random  Value  for  a  Designated  Set  X  C  P  with  Dishonest-majority. 

In  [12],  a  realization  of  .?^genRand{  )  based  on  UC-secure  multi-party  commitment  scheme  was  presented 
in  the  common  reference  string  (CRS)  model.  The  UC  secure  multi-party  commitment  scheme  is  further 
constructed  using  a  CCA-secure  encryption  and  the  double-trapdoor  homomorphic  commitment  scheme 
introduced  in  Section  2.  Specifically,  the  following  was  shown;  we  refer  to  [12]  for  the  details  of  the  instan¬ 
tiation  of  .?^genRand()  • 

Lemma  6  ([12]).  Assuming  CCA-secure  encryption  and  double-trapdoor  homomorphic  commitment  scheme, 
it  is  possible  to  {k,  s) -securely  realize  .^genRand{  )  the  (PcRSi  l^Bc) -hybrid  model  in  the  UC  frame¬ 
work.  The  protocol  generates  £  {■)-shared  random  values  and  has  communication  complexity  BC{\X\{£  + 
s)k, \X\). 

Protocol  for  Reconstructing  (•)-shared  Value.  Protocol  ilREc(  )  takes  as  input  an  (•)-sharing,  say  {s)x: 
and  either  allows  the  honest  parties  in  X  to  robustly  reconstruct  s  or  ensures  that  the  honest  parties  in  X  can 
(locally)  identify  at  least  one  corrupted  party  in  X.  The  protocol  is  based  on  the  following  standard  idea: 
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let  {s)i  =  {si,  Ui,Vi,  {Cs  ,u  ,v  } p  &x)  be  the  information  available  to  party  Pi  ^  X  eorresponding  to  {s)x- 
Then  eaeh  Pi  broadeasts  Si,Ui,  Vi  to  the  parties  in  X  via  Pbc-  Let  eaeh  party  Pi  reeeive  Sj,Uj,Vj  from  Pj. 
Pi  then  verifies  ifCs  -,u  ,v  =  Commck{sj;uj,Vj)-  If  the  verifieation  fails,  Pi  identifies  Pj  to  be  eorrupted 
and  outputs  (Failure,  i,j);  otherwise  Pi  sums  up  all  the  shares  to  obtain  s  and  outputs  (Success,  s).  In 
the  rest  of  the  deseription,  we  will  say  that  the  parties  in  X  participate  in  flREc(  }  with  {s)x  and  each 
Pi  ^  X  outputs  either  (Success,  f,  s)  or  (Failure,  i,j)  to  mean  the  above.  The  protoeol  has  communication 
complexity  BC{\X\k,  \X'\). 

Beaver’s  Multiplication  Protocol.  Protocol  P[bea{{x)x,  {y)x-,  {a)x,  {b)x,  {c)x)  is  a  standard  protocol 
for  securely  computing  {x  ■  y)x  from  {x)x  and  {y)x,  at  the  cost  of  two  public  reconstruction.  The  protocol 
assumes  that  the  parties  m  X  <Z  V  have  access  to  an  (•);t'-shared  random  multiplication  triple  {a,b,c) 
unknown  to  the  adversary,  with  c  =  a-b.  The  protocol  is  based  on  the  principle  that  x-y  =  {x  —  a  +  a)-{y  — 
6  +  6)  =  de  +  db  +  ae  +  c,  where  d  =  {x  —  a)  and  e  =  {y  —  b).  Hence  if  the  parties  in  X  reconstruct  d  and 
e,  then  they  can  locally  compute  {x  ■  y)x  =  de-\-d-  {b)x  +  e  •  {a)x  +  {c)x-  The  security  of  x  and  y  follows 
even  after  the  reconstruction  of  d  and  e,  as  x  and  y  are  masked  by  random  and  private  a  and  6  respectively. 
To  reconstruct  d  and  e,  the  parties  in  X  first  locally  compute  {d)x  =  {x  —  a)x  and  {e)x  =  {v  —  b)x, 
followed  by  invoking  ilREc(  )  with  inputs  {d)x  and  {e)x-  Depending  on  whether  the  instances  of  TfREcf  ) 
are  successful  or  not,  an  honest  party  in  X  may  output  (Success,  i,  {x  •  y)i)  or  (Failure,  i,  j).  In  the  rest  of  the 
description,  we  will  say  that  the  parties  in  X  participate  in  HbeaUx) x i  {y)x,  {a)x,  {b)x,  {c)x)  and  each 
Pi  output  either  (Success,  i,  {x  ■  y)i)  or  (Failure,  f,  j)  to  mean  the  above.  The  protocol  has  communication 
complexity  BC(\X\k,  |^|). 

D.l  The  Preparation  Stage  of  Protocol  ® 

We  are  now  ready  to  discuss  the  preparation  stage  of  our  protocol  77^^.  We  pursue  the  same  outline  as 
followed  by  the  preparation  stage  of  the  MPC  protocol  of  [12]  and  describe  the  same  briefly  below.  This 
is  followed  by  fhe  required  adapfafions  in  our  confexf.  The  preparafion  sfage  of  [12]  provides  securify  wifh 
aborf.  Namely  fhe  profocol  generafes  fhe  required  (-(-shared  friplefs  if  all  fhe  parfies  behave  honesfly;  ofh- 
erwise  if  fhe  honesf  parfies  idenfify  any  wrong-doing  fhen  fhey  simply  aborf. 

-  Triple  Generation:  The  involved  parties  generate  many  random  (-(-shared  triplets  by  executing  many 
instances  of  TItriple.  assuming  no  active  corruptions. 

-  Verification  of  the  Triples  via  Cut-and-choose:  A  random  fraction  of  the  triplets  are  verified  via  cut- 
and-choose  to  detect  any  cheating  attempts.  Specifically,  a  random  subset  of  generated  triplets  are  se¬ 
lected  and  the  parties  are  asked  to  disclose  the  randomness  that  they  used  in  the  instances  of  TItriple  for 
generating  the  selected  triplets.  If  any  cheating  is  detected  then  the  involved  parties  abort,  otherwise  they 
proceed  to  the  next  step.  If  the  test  passes  then  with  high  probability  it  is  ensured  that  the  majority  of  the 
remaining  untested  triplets  are  “good”  in  the  sense  that  they  are  honestly  generated. 

-  Proof  of  Knowledge:  The  goal  of  this  test  is  to  ensure  that  for  each  remaining  triplet,  every  party  has  the 
knowledge  of  their  shares,  thus  ensuring  independence  required  for  UC  security.  More  specifically,  dur¬ 
ing  the  generation  of  an  untested  triplet,  a  corrupted  party  Pi  could  broadcast  an  arbitrary 

or  Ccj,*,*,  being  oblivious  to  a*,  bi  and  Cj.  This  is  prevented  by  the  following  steps:  First  parties  gener¬ 
ate  random  (-(-shared  values  (by  calling  7'genRand(-))  then  they  open  the  difference  of  the  triplets 
and  those  random  shared  values  via  protocol  ilREc(-)-  Opening  these  differences  is  indeed  a  very  sim¬ 
ple  proof  of  knowledge  (see  [12]).  A  cheating  is  detected  if  some  of  the  opening  fail.  In  that  case  the 
involved  parties  abort,  otherwise  they  proceed  to  the  next  step. 
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-  Verification  of  the  Triplets  via  Sacrificing  Trick:  At  this  stage,  the  remaining  triplets  are  verified 
for  eorreetness  via  the  well-known  “saerifieing”  triek  [13].  Namely  for  every  pair  of  remaining  shared 
triplets  (a,  6,  c)  and  (x,  y,  z),  the  parties  generate  a  random  r  and  reeompute  an  (•)-sharing  of  a  •  6,  by 
assuming  rx,  ry,  r‘^z  as  a  multiplieation  triplet;  protoeol  TIbea  is  used  for  the  same.  Ideally  if  (a,  b,  c) 
and  {x,y,  z)  are  multiplieation  triplets,  then  the  differenee  of  the  sharing  of  c  and  the  reeomputed  ab 
should  be  a  sharing  of  zero,  whieh  is  verified  by  fhe  parfies  publiely  (using  profoeol  flREc(  ))-  If 
eheafing  is  defeefed  fhen  fhe  parfies  aborf,  else  fhey  proeeed  fo  fhe  nexf  sfep  afler  disearding  (x,  y,  z), 
whose  seeurify  is  saerifieed  during  fhe  verihealion  of  (a,  b,  c).  If  follows  fhaf  if  fhe  fesf  passes  fhen  exeepf 
wifh  probabilify  1  /p  over  fhe  ehoiee  of  r,  fhe  friplef  (a,  b,  c)  is  indeed  a  eorreef  mulfiplieafion  friplef  (see 
[12]  for  fhe  defails). 

-  Privacy  Amplification:  At  this  stage,  the  parties  jointly  perform  privaey  ampliheation  and  “distill” 
Cm  +  Cr  fully  random  private  triplets  from  asetofO((CM  +  CR)  +  A)  triplets,  where  X  of  them  might 
not  be  private^ reeall  that  Cm  and  Cr  are  the  number  of  multiplieation  and  random  gates  respeetively 
in  the  eireuit  C.  For  this,  .y^GENRAND(  )  along  with  ileEA  is  used.  If  any  eheating  is  deteeted  during  TIbea? 
then  the  parties  abort. 

In  our  eontext,  it  is  not  enough  to  abort  when  a  wrong-doing  is  deteeted.  If  some  party  Pi  ^  X  identifies 
any  party  Pj  G  X  eheating  in  any  of  the  steps  for  preparation  stage.  Pi  alarms  the  parties  in  V  by  raising 
a  eomplaint  against  Pj.  This  allows  the  parties  in  V  to  loealize  the  fault  to  a  pair  of  parties  {Pi,  Pj).  To 
simplify  the  fault-loealization,  we  set  a  designated  party  PRef  G  X  with  the  smallest  index  Ref  as  the  referee 
to  loeally  identify  any  fault  and  report  the  same  to  the  parties  in  V.  The  fault  loealization  step  in  eaeh  stage 
of  the  preparation  stage  is  emphasized  below. 

-  Fault  Localization  During  the  Verification  of  the  Triples  via  Cut-and-choose:  The  parties  in  X  first 
run  the  steps  for  the  eut-and-ehoose  triple-veriheation  as  in  [12].  If  any  party  P*  loeally  identifies  any 
fault  then  it  raises  an  alarm  for  the  parties  in  P.  On  reeeiving  the  alarm,  every  party  in  X  broadeasts  (to 
the  parties  in  X)  their  entire  view  (ineluding  the  randomness  used)  in  the  generation  of  the  triplets  under 
testing.  The  referee  PRef  then  “reeomputes”  every  message  a  party  Pj  G  A  should  send  to  every  other 
party  Pj  G  X  and  eompares  them  with  what  Pi  elaims  to  send  and  what  Pj  elaimed  to  reeeive.  In  ease 
there  is  any  mis-mateh,  then  PRef  raises  a  eomplaint  against  both  Pi  and  Pj  among  V  and  urges  Pi  and 
Pj  to  respond.  Now  depending  upon  the  response,  the  parties  ean  loealize  the  fault  to  either  (PRef,  Pi)  or 
(PRef,  Pj)  or  {Pi,  Pj).  The  important  observation  is  that  fault  will  never  be  loealized  to  a  pair  of  honest 
parties  from  X.  This  is  beeause  the  property  of  Pmult  ensures  that  if  both  the  partieipating  parties  are 
honest  then  they  never  eonfliet  with  eaeh  other.  A  loeated  pair  will  eontain  at  least  one  eorrupted  party. 

-  Fault  Localization  in  Proof  of  Knowledge:  The  parties  in  X  exeeute  the  same  steps  as  in  [12]  for  prov¬ 
ing  the  knowledge  of  their  shares.  If  any  party  P*  G  A  loeally  identifies  any  fault  during  the  instanees 
of  Prec(  )  (used  to  open  the  differenees  of  triplets  and  random  shared  values),  then  Pf  raises  an  alarm 
among  the  set  P,  while  the  referee  PRef  is  assigned  the  task  of  publiely  reporting  the  identity  of  the  party 
Pj  it  has  eaught  eheating.  The  fault  is  then  loealized  to  {Pj,  PRef).  If  an  honest  Pj  raises  an  alarm,  but 
a  eorrupted  PRef  does  not  identify  any  eheater,  then  the  fault  is  loealized  to  (Pj,  PRef).  It  is  easy  to  note 
that  a  loeated  pair  will  eontain  at  least  one  eorrupted  party. 

-  Fault  Localization  During  the  Verification  of  the  Triplets  via  Sacrificing  Trick:  Here  the  parties  in 
X  first  apply  the  saerifieing  triek  on  eaeh  pair  of  remaining  triplets.  Now  there  are  three  situations  under 
whieh  a  party  Pi  ^  X  ean  deteet  a  fault,  (a)  The  instanees  of  Pbea  is  unsueeessful.  In  this  ease,  the 

'  *  For  the  specific  instantiation  of  TTmult  based  on  Paillier  encryption,  this  is  indeed  the  case  if  one  of  the  participating  parties  in 
Pmult  is  corrupted;  see  [12]  for  the  details. 
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parties  in  V  localize  the  fault  in  the  same  way  as  in  the  previous  step.  Namely  Pi  raises  an  alarm  while 
Ppef  is  asked  to  identify  the  cheating  party,  (b)  The  instances  of  flREc(  }  to  open  the  difference  of  ab  and 
c  fails;  the  fault-localization  in  this  case  is  also  the  same  as  in  the  previous  step,  (c)  The  difference  of  ab 
and  c  is  non-zero.  Clearly  in  this  case,  at  least  one  of  the  involved  triplet  (in  the  pair)  is  not  generated 
correctly  and  so  the  parties  in  V  perform  the  fault-localization  in  the  same  way  as  in  the  cut-and-choose 
step.  Namely,  all  the  parties  in  X  publicly  open  (to  the  parties  in  X)  their  entire  view  produced  during 
the  generation  of  the  two  triplets  and  Ppef  is  then  asked  to  find  a  pair  of  “conflicting”  parties. 

-  Fault  Localization  in  Privacy  Amplification:  The  parties  in  X  execute  the  steps  for  privacy  amplifica¬ 
tion  [12].  If  any  cheating  is  detected  by  a  party  Pi  £  X  during  the  involved  instances  of  TIbea?  then  the 
parties  in  V  perform  the  fault  localization  in  the  same  way  as  it  is  done  during  for  a  failed  instance  of 
TIbea  in  the  previous  stage. 

The  protocol  steps  for  the  preparation  stage  are  given  in  Figure  17,  where  we  give  the  formal  steps  for  the 
fault  localization  with  respect  to  only  the  first  two  phases;  the  formal  steps  for  the  fault  localization  for  the 
remaining  phases  is  not  provided  to  avoid  repetition. 

In  the  protocol,  B  and  A  are  two  parameters.  In  [12],  it  was  shown  that  their  preparation  stage  provides 
a  statistical  security  of  They  set  B  and  X  as  B  =  3.6s  and  A  =  1/4  to  achieve  a  statistical 

security  of  2“®.  Since  our  preparation  stage  is  almost  the  same  as  that  of  [12]  bar  the  fault-localization  steps 
(which  does  not  affect  the  statistical  security  at  all),  it  follows  easily  via  [12]  that  our  preparation  stage 
also  provides  a  statistical  security  of  Intuitively  this  is  due  to  the  following  reason:  define  a 

triplet  to  be  a  good  one  if  the  adversary  could  open  it  correctly  during  the  Cut-and-choose  step  and  make 
an  honest  party  accept  (this  implies  that  such  a  triplet  is  generated  honestly),  otherwise  call  the  triplet  a 
bad  triplet  (i.e.  such  triplets  are  not  generated  honestly  and  so  adversary  may  know  some  information  about 
honest  partys’  shares  for  such  triplets).  Then  it  follows  from  [12]  that  if  the  protocol  reaches  the  Privacy 
Amplification  phase,  then  the  probability  that  the  triplets  considered  during  this  phase  has  more  than  B  bad 
(and  hence  non-private)  triplets  is  at  most  (1  -|-  A)“^.  As  a  result,  adversary  may  know  at  most  B  points  on 
the  polynomials  F(-)  and  G{-)  of  degree  at  most  d,  implying  Cm  +  Cr  degree  of  freedom  in  the  view  of 
the  adversary.  Note  that  as  suggested  in  [12],  instead  of  creating  “big”  polynomials  F(-)  and  G(-)  of  huge 
degrees,  we  can  partition  the  remaining  triplets  in  A4  into  batches  of  smaller  size  and  accordingly  use  many 
polynomials  of  small  degree,  without  affecting  the  security  properties;  we  prefer  to  present  the  Privacy 
Amplification  phase  the  way  presented  in  [12]. 
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Preparation  Stage  of  Uq^ 

The  public  input  to  the  protocol  is  a  set  of  parties  X  G  V  containing  at  least  one  honest  party  and  a  referee  Pnef  €  X,  with 
the  smallest  index  Ref.  For  the  session  id  sid,  every  party  Pi  (zV  participates  with  (sid,  i)  and  does  the  following: 

Triple-generation  Assuming  No  Active  Corruption  —  If  Pi  €  X  then  participate  in  the  protocol  TTtriple  (1  +  A)  (4(Cm  + 
Cr)  +  45  —  2)  times  to  generate  a  set  At  of  (1  +  A)(4(Cm  +  Cr)  +  4B  —  2)  (•)-shared  triplets. 

Testing  the  Triplets  via  Cut-and-Choose  —  If  Pi  ^  X  then  do  the  following: 

-  Call  .AgenRand(  )  to  sample  a  random  string  str  that  determines  a  subset  T  C  At  of  size  A(4(Cm  +  Cr)  +  45  —  2). 
Set  Af  =  A1  \  T.  Let  View^  denote  the  randomness  used  by  Pi  and  the  messages  received  from  the  other  parties  in 
X,  during  the  instances  of  TTtriple  used  for  generating  the  triplets  in  T.  Reveal  \/\ewJ  to  the  parties  in  X  by  calling 
Abc  with  (sid,  i,  View^,  X). 

-  Corresponding  to  each  Pj  G  X,  receive  (sid,  i,  j,  ViewJ )  from  Abc-  Using  {View^  a:,  reproduce  every  message 
that  should  have  been  sent  by  every  sender  Pa  G  X  to  every  receiver  Pb  £  X  during  the  generation  of  the  triplets 
in  T,  and  compare  it  with  the  corresponding  value  that  the  recipient  Pb  claims  to  have  received.  If  any  conflict  is 
detected,  then  do  the  following  for  the  smallest  indexed  conflicting  parties  Pa,  Pb' 

•  If  Pi  7^  PRef ,  then  call  Pbc  with  (sid,  i,  Er  r,  P)  to  indicate  to  the  parties  in  P  that  a  conflict  has  been  detected. 

•  Else  call  Pbc  with  (sid.  Ref,  Err,  Pa,  Pb,  I,  x,  x,  P)  to  indicate  that  referee  Pi  identified  Pa,  Pb  £  X  the  least 
indexed  conflicting  parties  and  a  message  with  index  I  where  Pa  should  have  sent  x  but  Pb  claimed  to  receive 
X  ^  X. 

-  If  the  message  (sid,  i,  Ref,  Err,  Pa,Pb,  I,  x,  x)  is  received  from  Pbc  and  if  Pa  =  Pi  or  Pb  —  Pi,  then  call  Pbc  with 
(sid,  i.  Agree,  PRef,  P)  to  indicate  that  you  agree  with  PRef,  else  call  Pbc  with  (sid,  i,  Disagree,  PRef,  P). 

Fault  Localization  — 

-  If  the  message  (sid,  i.  Ref,  Err,  Pa,  P;,,  Z,  a;,  T)  is  received  from  Pbc  and  subsequently  (a)  if 
(sid,  i,  a.  Disagree,  PRef)  is  received  from  Pbc,  then  output  (sid,  i.  Failure,  PRef,  Pa)  and  halt  (b)  if 
(sid,  i,  b.  Disagree,  PRef)  is  received  from  Pbc,  then  output  (sid,  i.  Failure,  PRef ,  Pb)  and  halt.  Else  output 
(sid,  i.  Failure,  Pa,  Pb)  and  halt. 

-  If  no  message  of  the  form  (sid,  i.  Ref,  Er  r,  *,  *,  *,  ★,  *)  is  received  from  Pbc,  but  corresponding  to  some  Pj  £  X  the 
message  (sid,  i,j,  Err)  is  received  from  Pbc,  then  output  (sid,  i.  Failure,  PRef,  Pj)  and  halt. 

Proof  of  Knowledge  —  If  Pi  £  X  then  do  the  following  for  every  (untested)  triplet  {{a)x,  {h)x,  {c)x)  in  AI:  Sample 
three  random  {-(-shared  values  {r)x,  {s)x,  {u}x  by  invoking  PgenRand(-)-  Participate  in  instances  of  Prec(->  with  (r  — 
o,}x,  {s  —  b)x  and  (w  —  c)x-  If  (sid.  Failure,  i,j)  is  the  output  in  any  of  the  instances  of  i7REc(->,  then  do  the  following: 

-  If  Pi  yf  PRef  then  call  Pbc  with  (sid,  i,  j,  Er  r ,  P)  to  indicate  that  a  cheating  has  been  detected. 

-  Else  if  Pi  =  PRef  then  call  Pbc  with  (sid.  Ref,  Err,  j,  P)  to  indicate  Pj  is  identified  as  a  cheater;  if  there  are  several 
such  PjS  then  select  the  one  with  the  minimum  index  j. 

Fault  Localization  — If  a  message  (sid,  i,  Ref,  Err,  j,  P)  is  received  from  Pbc,  then  output  (sid,  i.  Failure,  PRef,  Pj)  and 
halt.  Else  if  a  message  (sid,  i,j,  Err)  is  received  from  Pbc,  then  output  (sid,  i.  Failure,  Pi,  Pj)  and  halt. 

Verification  Via  sacrificing  Trick  —  If  Pi  £  X  then  do  the  following  for  every  pair  of  triplets 

{{a) X ,  {h) X ,  {c) x)  and  {{x) x ,  (y) x ,  {z) x)  in  A4:  Call  PgenRand<-)  and  sample  a  random"  r.  Participate  in 
nBEA{{a)x,  {b)x,  {rx)x,  {ry)x,  {r^z}x)  for  computing  {'c)x  followed  by  participation  in  Prec(->  with  (c  —  c)x.  If 
no  cheating  has  been  identified  during  Pbea,  Prec<-)  and  if  c  —  c  =  0,  then  store  {{a}x,  {b)x,  {c)x)  for  future  use  and 
drop  {{x)x,  {y)x,  {z)x)  from  AI.  Else  proceed  to  the  fault-localization  step. 

Fault  Localization  —  If  the  parties  in  X  have  raised  a  complaint  due  to  the  failure  of  Pbea  or  Prec(->,  then  localize  the 
fault  in  the  same  way  as  in  the  case  of  fault-localization  for  the  Proof  of  Knowledge  step.  Else  localize  the  fault  in  the 
same  way  as  in  the  Cut-and-Choose  step  by  asking  the  parties  in  X  to  open  their  entire  view  of  the  disputed  triplet. 

Privacy  Amplification  —  The  parties  in  X  are  now  left  with  2(Cm  +  Cr)  +  2B  —  1  triplets  {{{a!°)x, 
(c'‘)n^)}fe=i.....2(CM+CR)-i-2s-i  in  Af.  Let  d  =  (Cm  +  Cr)  -|-  P  -  1.  If  P;  e  A  then  do  the  following: 

-  Invoke  PgenRand(.>  2(d  -f  1)  times  to  generate  {f^^'’)x,  ■■■,  and  {g^'^'')x,  ■■■,  x  ■ 

-  Let  P(-)  and  G(-)  be  the  polynomials  of  degree  at  most  d  such  that  F{ak)  =  and  G{ak)  =  g^^^  for  k  = 
1, . . . ,  d  -I-  1.  Locally  compute  {F{ad+2))x,  •  •  • ,  {F{a2d+i))x  and  {G{ad+2))x,  •  •  • ,  {G{a2d+i))x-  For  k  = 
1, . . . ,  2d  -I-  1,  participate  in  PBEA((P(afe))v,  {G{ak))x,  {a^^^)x,  {b^^^)x,  {c^^^)x)  for  computing  {h^^^)x  = 
{F{ak)-G{ak))x. 

-  If  any  cheating  is  identified  during  TTbea,  then  proceed  to  the  fault  localization  step.  Else  let  iT(  )  be  the  poly¬ 

nomial  of  degree  at  most  2d  such  that  H{ai)  =  Zi*-')  for  i  =  1, . . . ,  2d  -f  1.  Then  output  (sid,  i.  Success, 
{((a('‘^)A’,  (c^'‘^)A-)}fe=i,,,.,CM+CR)  and  halt,  where  =  F{-ak),h^'‘'’  =  G(-afe)  and  = 

H{-ak). 

Fault  Localization  —  If  any  complaint  is  raised  due  to  the  failure  of  Pbea,  then  localize  the  fault  as  in  the  Proof  of 
Knowledge  step.  Else  every  Pi  £P\X  output  (sid,  i.  Success)  and  halt. 


"  It  is  enough  to  sample  a  single  r  for  all  the  pairs  of  availa§Siiplets. 


Fig.  17.  Generating  Cm  +  Cr  (-(-shared  Multiplication  Triples  with  Statistical  Security  2  ®  '°S2(i+'^) 
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D.2  Protocol 


In  this  section  the  protocol  77^^  is  presented  in  Figure  18,  where  during  the  circuit-evaluation  stage,  we  fol¬ 
low  the  idea  outlined  earlier  in  section  3.2.  Note  that  during  the  circuit  evaluation,  an  instance  of  TIbea  may 
fail,  in  which  case  the  parties  in  V  localize  the  fault  via  the  referee  Ppef  in  the  same  way  as  it  was  done  in 
the  preparation  stage. 


Protocol  77^® 

The  public  input  to  the  protocol  is  a  set  of  parties  X  C  V  containing  at  least  one  honest  party  and  an  arithmetic  circuit  C  over 
Fp  consisting  of  in  input  gates,  out  output  gates,  Cm  multiplication  gates  and  Cr  random  gates.  In  addition,  (*1)^', . . . ,  (atin)v 
are  the  {•)-shared  inputs  for  C.  Let  PRef  £  X  be  the  party  with  the  smallest  index  Ref  who  is  set  as  the  referee  to  localize  any 
fault  occurred  during  the  protocol. 

For  the  session  id  sid,  party  Pi  GV  participates  with  (sid,  i)  and  does  the  following: 

Preparation  Stage:  execute  the  steps  of  Figure  17. 

Computation  Stage:  If  (sid,  i,  Success,  (c^*^)A’)}fc=i,,,,,CM+CR)  or  (sid,  i,  Success)  is  obtained  at  the 

end  of  preparation  stage,  then  do  the  following: 

-  {  (-shared  Evaluation  of  the  Circuit  C  —  If  Pi  £  X  then  do  the  following  for  every  gate  in  the  circuit  C: 

•  Input  Gate:  For  1  =  1, . . . ,  in,  associate  {xi)x  with  the  corresponding  input  gate  of  C. 

•  Random  Gate:  For  the  random  gate  in  C  where  k  £  {1, . . . ,  Cr},  associate  {aS'‘'^)x  as  the  output  of  the 
random  gate. 

•  Addition  Gate:  If  {x)x  and  {y}x  are  the  {-(-shared  inputs  of  the  gate,  then  locally  compute  (x  -f  y}x  = 
{x}x  -h  {y}x  and  associate  it  as  the  output  of  the  addition  gate. 

•  Multiplication  Gate:  For  the  multiplication  gate  in  C  with  the  {-(-shared  inputs  {x)x  and  {y)x 

where  k  £  {!,...,  Cm},  associate  the  triplet  ({a^‘'f*'''*^^)A-,  {b*-'’'^'''*’^)^',  {c‘-‘~’^^*^^)a-).  Participate  in 

nBEA{{x)x,  {y)x,  {a^'-R+''^)A’,  {b‘^‘-R+*^)A-,  {c‘^‘-'^+*')a-)  to  compute  {x-y)x-  If  (sid,i.  Failure,})  with  Pj  e 
X  is  obtained  during  the  instance  of  Pbea  then  do  the  following: 

*  If  Pi  7^  PRef  then  call  Pbc  with  (sid,  i,  Err,  P)  to  indicate  that  a  cheating  has  been  detected  while  execut¬ 
ing  Pbea. 

*  Else  if  Pi  =  PRef  then  call  Pbc  with  (sid,  Ref,Err,},P)  to  indicate  that  P7  is  identified  as  a  cheater  while 
executing  PbeaI  if  there  are  several  such  PjS  then  select  the  one  with  the  minimum  index  j. 

-  Fault  Localization  — 

•  If  there  exists  a  multiplication  gate  in  C  corresponding  to  which  a  message  (sid,  i,  Ref,  Err,  j)  is  received  on 
the  behalf  of  PRef  from  Pbc  then  output  (sid,  i,  Failure,  PRef,  Pj)  and  halt. 

•  Else  if  there  exists  a  multiplication  gate  in  C  corresponding  to  which  a  message  (sid,  i,  j,  Err)  is  received  from 
Pbc  on  the  behalf  of  Pj  £  X,  but  no  message  of  the  form  (sid,  i,  Ref,  Er r,  ★,  ■*:)  is  received  from  Pbc  on  the 
behalf  of  PRef,  then  output  (sid,  i,  Failure,  PRef,  Pj)  and  halt. 

•  Else  if  Pi  £  V\X  then  output  (sid,  i,  Success)  and  halt;  otherwise  output  (sid,  i,  Success,  {yi}x,  -  -  - ,  {yout)x) 

_ and  halt,  where  {yi)x,  ■  ■  ■ ,  {yout)x  are  the  {-(-shared  outputs  associated  with  the  output  gates  of  C. _ 


Fig.  18.  Protocol  for  Secure  {-(-shared  Evaluation  of  a  Given  Circuit  C  with  Statistical  Security  1  —  2  ®*°S2(i+^) 

The  correctness  of  the  protocol  follows  via  the  binding  property  of  the  commitment  and  the  detailed 
informal  discussion  above,  while  we  appeal  to  [12]  for  the  proof  of  privacy  in  UC  secure  framework.  We 
now  prove  Lemma  4  (the  lemma  statement  is  available  in  Section  3),  by  setting  A  =  1/4  and  B  =  3.6s  as 
done  in  [12],  so  that  the  protocol  provides  a  statistical  security  of  2“^. 

Proof  of  Lemma  4:  We  prove  the  communication  complexity  of  the  preparation  stage,  with  the  obser¬ 
vation  that  Cm  +  Cr  =  (!1(|C|).  During  the  Triple-generation  phase,  0{(iu  +  Cr  -F  77)  instances  of 
TTtriple  are  executed  by  the  parties  in  X,  thus  requiring  communication  complexity  of  0{\X\'^{\C\+  B)k) 
wdBC{\X\{\C\+B)K,\X\). 
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During  the  Cut-and- Choose  phase,  0(Cm  +  Cr  +  i?)  calls  to  are  made  for  generating 

0(Cm  +  Cr  +  -B)  random  (•)-shared  commitments  with  statistical  security  incurring  com¬ 

munication  complexity  of  BC(|A’|(|C|  -|-  B)k,  \X\).  In  addition,  the  parties  in  X  need  to  broadcast  among 
themselves  their  entire  view  of  Btriple  with  respect  to  0(Cm  -|-  Cr  -|-  B)  triplets.  This  incurs  a  communi¬ 
cation  complexity  of  BC(|T:’P(|C|  +  B)k,\X\').  During  the  fault-localization  step,  the  parties  in  X  need  to 
broadcast  0{k)  bits  to  the  parties  in  V,  thus  requiring  communication  complexity  of  BC[\X\k,  n) . 

During  the  Proof  of  Knowledge  phase,  0(Cm  -|-  Cr  -|-  B)  calls  to  BgenRand(  )  made  and  0(Cm  + 

Cr  -|-  B)  instances  of  are  executed  by  the  parties  in  X,  thus  requiring  a  communication  complexity 

of  BC(|T’|(|C|  -|-B)k,  IT’I).  In  addition,  during  the  fault-localization  step,  the  parties  in  X  need  to  broadcast 
0{k)  bits  to  the  parties  in  V,  thus  requiring  communication  complexity  of  BC[\X\k,  n). 

During  the  Correctness  phase,  0(Cm  -|-  Cr  -|-  B)  instances  of  Brea  and  B^g^^y  are  executed  by 
the  parties  in  T".  In  addition,  the  parties  in  X  may  need  to  publicly  open  among  themselves  the  entire 
view  of  Btriple  with  respect  to  a  disputed  pair  of  triplet.  Thus  this  phase  has  communication  complexity 
of  BC(|T:’|(|C|  +  B)k,\X\^,  with  an  additional  communication  complexity  of  BC[\X\K,n)  for  the  fault- 
localization  step.  It  follows  easily  that  the  Privacy  Amplification  phase  as  well  as  the  circuit  evaluation 
stage  has  communication  complexity  of  BC(|T:’|(|C|  -|-  B)k,  \X\')  for  executing  the  steps  within  X  and  has 
communication  complexity  of  BC{\X\k,  n)  for  any  possible  fault-localization. 

During  the  computation  stage.  Cm  instances  of  Brea  are  executed  and  fault-localization  is  done  at  most 
once.  It  thus  follows  that  setting  B  =  3.6s,  the  protocol  has  communication  complexity  OdApdCj  + 
s)k),BC(^\X\‘^{\C\  +  s)k,  1^1)  and  BC{\X\K,n).  □ 

E  Proof  of  Theorem  1 

Security.  We  prove  the  security  by  designing  a  simulator  for  the  protocol  By.  Let  T  C  V  be  the  set  of 
parties  under  the  control  of  A  during  the  protocol  Bj;  we  present  a  simulator  Sf  (interacting  with  the 
functionality  Bj)  for  A  in  Figure  19.  The  high  level  idea  for  the  simulator  is  the  following:  the  simula¬ 
tor  takes  the  input  and  interacts  with  Xf  to  obtain  the  function  output  y.  The  simulator  then 

invokes  A  with  the  inputs  and  simulates  each  message  that  A  would  have  received  in  the  pro¬ 

tocol  By  from  the  honest  parties  and  from  the  functionalities  called  therein,  step  by  step.  Notice  that  the 
simulator  Sj  also  needs  to  simulate  the  protocol  steps  of  the  honest  parties  for  the  sub-protocols 
By^^[.],  B^^^  and  BRA^Ng,2gRo[.]-  Specifying  the  simulator  steps  for  these  subprotocols  would  make  the  de¬ 
scription  of  By  complicated.  So  for  the  ease  of  presentation,  we  define  three  sub-simulators  (Fig.  20), 

(Fig.  21),  and  BrandZero[  ]  (Fig-  22)  which  are  invoked  by  By  for  simulating  the  steps  of  the  honest 
parties  for  the  instances  of  and  BrandZero[-]  respectively;  technically,  the  steps  specified 

for  B[.]^y^ ,  By^^[.]  and  BrandZero[-]  acfually  done  by  the  main  simulator  By.  While  invoking  these  “sub¬ 
simulators”,  By  will  provide  its  entire  internal  state  to  them  and  the  sub- simulators  then  return  back  their 
internal  state  (after  the  required  simulation)  to  the  main  simulator.  Similarly,  we  also  assume  the  presence 
of  a  simulator  B^^^ ,  which  can  be  invoked  by  By  to  simulate  the  steps  of  the  honest  parties  for  the  protocol 
explicitly  give  the  steps  of  B^^^ ,  but  rather  appeal  to  the  simulator  of  the  MFC  protocol 
of  [12]  because  the  protocol  steps  of  B^^^  are  almost  the  same  as  the  MFC  protocol  of  [12],  bar  the  fault- 
localization  steps.  However,  simulating  the  steps  of  fault-localization  is  straight  forward,  since  the  simulator 
will  know  the  entire  states  of  all  the  honest  parties  in  B^^^  and  so  any  wrong-doings  by  the  corrupted  parties 
can  be  easily  identified  by  the  simulator  exactly  as  it  was  identified  by  an  honesf  party  in  B^^^ . 

It  is  easy  to  show  that  lDEALp^Sf,z  ^  REALnf,A,z  in  the  (Bcrs,  -Brc,  .^committee,  .Bgen[-]  ,  •i^GENRAND(-> , 
BzK.Bc)-hybrid  settings  due  to  the  privacy  of  the  the  secret  sharing  schemes  and  the  statistical  hiding  prop- 
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Simulator  5/ 

The  simulator  plays  the  role  of  the  honest  parties  and  simulates  each  step  of  the  protocol  il/  as  follows.  The  communication 
of  the  Z  with  the  adversary  A  is  handled  as  follows:  Every  input  value  received  by  the  simulator  from  Z  is  written  on  ^’s 
input  tape.  Likewise,  every  output  value  written  by  A  on  its  output  tape  is  copied  to  the  simulator’s  output  tape  (to  be  read  by 
the  environment  Z).  The  simulator  then  does  the  following  for  the  session  ID  sid: 

Initialization.  Sf  sets  its  internal  variables  C  =  V,n  =  n,t  =  t  and  NewCom  =  1. 

CRS  Generation.  On  receiving  (sid,i)  from  every  Pi  e  T,  simulator  5/,  on  behalf  of  J^crs,  computes  Gen(l'^)  ^ 
(ck,  To,  Ti)  and  G(l'^)  ^  (pk,  sk),  sets  CRS  =  (ck,  pk)  and  sends  (sid,  i,  CRS)  to  every  Pi  £  T. 

Input  commitment.  On  behalf  of  every  honest  party  Pi  £  V  \  T,  Sf  picks  three  random  polynomials  over 
Fp,  of  degree  t  such  that  /*'*^(0)  =  0  and  imitates  the  behavior  of  the  honest  par¬ 

ties.  That  is,  Sf  computes  the  commitment  Cf(-)(o),g(i)(o),h('Ho)  =  Commck(/^'H0);3(‘^(0),hW(0))  and  sends 
(sid,i,  9<‘)(o)  tj(0(o))  '■O  svery  corrupted  Pj  £  T  on  behalf  of  Pbc-  When  a  corrupted  Pi  €  T  invokes 

with  (sid,  i,  g(i)(o)  /i(i)(o)>^)’  simulator  Sf  acts  on  behalf  of  Pbc  and  sends  g(i)(o)  h(i)(o) 

Pj&T.  '  ’ 

[•] -sharing  of  Inputs.  For  every  honest  Pi  £  V  \  T,  simulator  Sf  acts  on  behalf  of  functionality  .7^gen[.]  with 
(sid,i, /^®^(-),p^®^(-),/i^*^(-))  and  hands  (sid,  j,  i,  [/*''Ho)]j)  to  every  Pj  £  T.  Then  for  every  corrupted  Pi  £  T,  on 
receiving  (sid,  i, /**^(-),  (•),  (•))  from  Pi  (as  the  dealer),  Sf,  on  behalf  of  JFqepjj.],  sends  (sid,ji,i,  [/^*^(0)]i)  to 

every  Pj  £  T,  after  verifying  the  polynomials  {■),  with  respect  to  the  corresponding  commitment 

g^'Ho)  done  by  the  functionality  Locally,  simulator  maintains  the  following  information: 

-  Sf  stores  the  input  of  corrupted  Pi  G  T  as  =  /^*^(0),  where  /^*^(-)  is  received  from  corrupted  Pi.  Further  it 
sets  the  input  of  honest  Pi  £  "P  \  T  as  =  0. 

-  For  every  Pi  £  V,  it  stores  the  entire 

Sf  hands  to  the  MFC  functionality  Tf  on  behalf  of  the  corrupted  parties  and  gets  back  the  outputs  y  from 

the  functionality.  Next  Sf  computes  the  remaining  circuit  using  Os  as  the  inputs  of  the  honest  parties  and  {x^'^jp^gr 
as  the  inputs  of  the  corrupted  parties.  For  these  inputs,  it  knows  the  value  to  be  associated  with  each  wire  of  the  circuit. 
Thus  it  knows  the  circuit  output  y  resulted  from  the  above  set  of  inputs,  namely  Os  as  the  inputs  of  the  honest  parties  and 
{x^*^ } p.  gT  as  the  inputs  of  the  corrupted  parties. 

Start  of  while  loop  over  the  sub-circuits.  Set  I  =  1  and  while  I  <  L,  Sf  continues  as  follows: 

-  Committee  Selection.  If  NewCom  =  1,  on  receiving  (sid,  i,  C)  from  every  party  Pi  G  T,Sf  on  behalf  of  JFcommittee 

picks  c  parties  from  its  local  set  £  at  random  and  assigns  them  to  C.  It  then  sends  (sid,  Pi,C)  to  every  Pi  G  T. 

-  [•]  to  {■}c  Conversion  of  Inputs  of  ckt;.  Let  [xi], . . . ,  [xm,]  denote  [-J-sharing  of  the  inputs  to  the  sub-circuit  cktj.  For 

k  G  {1, . . . ,  in;},  Sf  invokes  the  sub-simulator  5[.]^(.)  (Fig.  20)  that  simulates  the  steps  of  the  honest  parties 
in  with  (sid,  {[xk]i} , C)  (namely  with  the  shares  corresponding  to  the  honest  parties).  The  sub¬ 

simulator  returns  Sf  with  (sid,  {(xfc)i}p.gCA(p\T))- 

-  Evaluation  of  the  Sub-circuit  ckt;.  The  simulator  Sf  invokes  the  simulator  <5^^^  (namely  the  simulator  of  the  MFC 

protocol  of  [12]  with  the  appropriate  modifications  in  our  context  to  do  fault  localization)  for  simulating  the  steps  of 
the  honest  parties  in  the  protocol  fl™  . 

-  {-(c  to  [•]  conversion  of  Outputs  of  ckt;.  Sf  invokes  5(.)^[.]  with  (sid,  {(i/fc)i}p.gCA(p\T),C)  for  every  k  G 

{1, . . . ,  out;}  and  gets  back  either  (sid,  {[j/fe]i}p;gp\T)  or  (sid,  i,  Failure,  Pa,  Pb)  or  (sid,  i.  Failure,  Pa)  and  does 
the  following: 

-  If  (sid,  {[yk\i} p^£v\t)  is  received  for  every  k,  increment  (  =  (  -F  1,  set  NewCom  =  0,  store  the  sharings  and 
return  to  the  while  loop. 

-  If  (sid,  i,  Failure,  Pa,  Pb)  is  received  for  some  k  G  out;,  update  £  as  £  =  £  \  {Pa,  Pb},  t  as  t  =  t  —  1,  n  as 
n  =  n  —  2. 

-  If  (sid,  i.  Failure,  Pa)  is  received  for  some  k  G  out;,  update  £  as  £  =  £  \  {Po},  tast  =  t— l,tiasn  =  n— 1. 

-  Set  NewCom  =  1  and  go  to  Committee  Selection  Step. 

Output  Rerandomization  Let  [y]  denote  the  [-J-sharing  of  the  output  of  ckt.  Sf  invokes  iSrandZero[.i  with  input  (sid,  y  —  y). 
5randZero[.i  simulates  the  honest  parties  in  protocol  iTRANDZERo[.]  and  returns  to  Sf  (sid,  {[y  —  y]i} Pi^{v\T))-  Sf  locally 
computes  [y]i  =  [y]i  +  [y  —  y]i  for  every  Pi  gP\T. 

Output  Computation.  On  behalf  of  every  honest  Pi,  Sf  sends  {s\d,  i,  j,  fi,  gi,  hi)  to  every  Pj  G  T  where  \y\i  = 
{fi.gi,hi,{Cf.^g.^hj}p.  .  g-p).  Clearly  every  Pi  G  T  will  recover  y  at  the  end  due  to  the  output  rerandomization  step. 

The  simulator  then  outputs  .4’s  output  and  terminate. 


Fig.  19.  Simulator  for  the  adversary  A  corrupting  at  most  t  parties  in  the  set  T  C  "P  in  the  protocol  11  / . 
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erty  of  the  underlying  eommitment  seheme.  For  the  eorreetness  of  the  protoeol,  we  rely  on  the  trapdoor 
seeurity  and  binding  properties  of  the  underlying  double  trapdoor  eommitment  seheme. 


Simulator 

For  session  id  sid,  on  receiving  (sid,  {[s]i}pi67n\T,  -F)  from  Sf,  iS[.]^(.)  interacts  with  the  corrupted  parties  in  T  on  behalf  of 
the  honest  parties  in  an  instance  of  Protocol  .  The  simulator  <S[.]^(.)  is  aware  of  the  internal  state  of  the  honest  parties 

corresponding  to  the  [s].  The  simulator  proceeds  as  follows: 

Veriflably  {•)-sharing  the  Share  and  Opening  Information  in  [s]i.  First,  5[.]^(.)  acts  on  behalf  of  every  honest  Pi  as  the 
dealer  in  an  instance  of  with  (sid,  [s]i,  X)  such  that  [s]i  =  (/i,  Qi,  hi,  }Pjev)-  It  also  simulates  every 

honest  party  Pi  in  an  instance  of  where  a  corrupted  Pk  £T  acts  as  a  dealer. 

Identifying  the  Correctly  {  (-shared  Shares  of  s  and  Generating  {s)x-  If  a  corrupted  Pi  €  T  is  caught  cheating  during 
conflict  resolution  step,  then  exclude  it  from  a  set  Ti.  that  is  initialized  to  V.  Otherwise,  for  every  corrupted  Pi  £  T, 
let  it  receive  {fi)k  on  behalf  of  every  honest  Pk  £  {V  \T)  A  X.  Without  loss  of  generality,  let"  H  =  {Pi, . . .  ,P\n\} 
and  let  ci, . . . ,  c\u\  be  the  publicly  known  Lagrange  interpolation  coefficients,  such  that  ci/i  +  •  •  •  +  C\'H\f\'H\  —  S. 
Then  the  simulator  locally  computes  {s)i  =  ci{fi)i  +  . . .  +  ci-}i\{f\7i\}i  on  behalf  of  every  honest  Pi  £  X  and  returns 
(sid,  {(s)i}pigA:A{p\T))  to  Sf. 


“  The  set  Ti  will  be  of  size  more  than  f  +  1. 


Fig.  20.  Simulator  to  be  Invoked  by  the  MPC  Simulator  <S/  for  Simulating  the  Steps  of  Sub-protocol  in  Ilf 


Simulator  <S(.)^[.] 

For  session  id  sid,  on  receiving  (sid,  {(s)i}p.gA’A{p\T),  X)  from  Sf,  5(.)^[.]  interacts  with  the  corrupted  parties  in  T  on 
behalf  of  the  honest  parties  in  an  instance  of  Protocol  Note  that  the  simulator  is  aware  of  the  internal  state  of  the 

honest  parties  corresponding  to  the  (s).  The  simulator  proceeds  as  follows: 

-  iS(.)^[.],  on  behalf  of  every  honest  Pi  £  X,  interprets  {s)i  as  (si,  ui,  wi,  {p^. g^:)  and  sends 

(sid,  fc,  i,  }p^. gA-)  to  every  Pk  €  T  (acting  on  behalf  of  functionality  Pbc  that  would  have  been  called  by 

Pi  in  the  hybrid  protocol).  On  receiving  (sid,  i,  ,vj  }pj  eXj'P)  from  every  Pi  £  T,  acts  on  behalf  of  Pbc 

and  sends  (sid,  k,  i,  jp^ex)  to  every  Pk  £  T. 

-  If  there  exists  a  pair  of  parties  Pa,  Pt  £  X,  such  that  Pa  is  honest  and  Pb  £  T  and  jp^gA^ 

{Cs^^uj.vjjpjsx,  then  return  (sid.  Failure,  Pa,  Pb)  to  Sf  and  halt. 

-  On  behalf  of  every  honest  Pi  £  X,  iS(.)^[.]  acts  as  a  D  and  selects  (•),  gr*-*-*  (•)  and  (•)  such  that  they  are  random 

polynomials  of  degree  at  most  t,  subject  to  the  condition  that  (0)  =  Si ,  (0)  =  Ui  and  (0)  =  Vi .  Then  on  behalf 

of  Pgen[1>  ‘5(.)^[.]  creates  the  [s;]  exactly  in  the  way  Pgen[  ]  would  compute  on  input  (sid,  i, /*■*■’(•),  gr*-*^  (•), (•)) 
from  Pi  as  the  dealer.  Then  it  hands  (sid,  k,  i,  [sijfe)  to  every  Pk  £  T.  On  receiving  (sid,  i,  /(*)  (•))  from 

a  corrupted  Pi  £  T  acting  as  D,  <S(.)^[.]  acts  exactly  as  Poen[1  and  returns  either  (sid,  i,  k.  Failure)  or  (sid,  i,  k,  [sfe];) 
to  every  Pi  £  T.  returns  (sid.  Failure,  Pfe)  to  Sf  when  (sid,  i,  fe.  Failure)  was  generated  for  any  Pk  £  T. 

Otherwise,  it  locally  computes  [s]i  =  X^p^gA"  I'®*]®  every  honest  Pi  £  P  \  T,  returns  (sid,  {[s]i}p.gParties\T)  to  5/ 
and  halts. 


Fig.  21.  Simulator  <S(.)^[.]  to  be  Invoked  by  the  MPC  Simulator  <S/  for  Simulating  the  Steps  of  Sub-protocol  in  Ilf 


387 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


Simulator  (Srandzero[-] 

For  the  session  id  sid,  on  receiving  (sid,  y  —  y)  from  5/,  <SrandZero[  ]  interacts  with  the  corrupted  parties  in  T  on  behalf  of  the 
honest  parties  in  an  instance  of  Protocol  -/IrandZero[1-  The  simulator  proceeds  as  follows: 

Publicly  Committiug  0: 

-  On  behalf  of  honest  party  (it  just  chooses  any  honest  party  from  the  set  V)  randomly  selects  Uh,Vh  £  Fp,  sets 

Th  =  y  — ji  and  computes  =  Commck(y  — y;  Wi,  «;)■  behalf  of  every  other  honest  party  Pi,  it  randomly 

selects  Ui,  Vi  £  Fp,  sets  n  =  0  and  computes  Cri,ui,vi  =  Commckjfi;  Mi,  Mi).  On  behalf  of  Pzk.bc  corresponding 
to  every  honest  Pi,  it  then  sends  (sid,  i,  Cr^, to  every  Pj  €  T.  On  receiving  (sid,  i,  Ui,  Wi)  from 

every  corrupted  Pi  €  T,  it  acts  as  Pzk.bc  and  verifies  if  Cri,ui,vi  =  Commck(0;  Ui,  Mi).  It  the  tests  passes,  then 
the  simulator  on  behalf  of  Pzk.bc  sends  (sid,  i,  Cri,ui,vi)  to  every  Pj  £  T.  Otherwise,  it  sends  (sid,  i,  _L)  to  every 
P,  £  P. 

-  It  then  constructs  a  set  T,  initialized  to  0  and  include  in  T  all  the  honest  parties  in  P  and  Pi  £  T  if  = 

Commck(0;  Mi,  Mi)  was  true  for  Pi. 

[•]-sharlug  0: 

-  On  behalf  of  honest  party  Pi,  it  selects  three  random  polynomials  and  each  of  degree  at 

most  t,  subject  to  the  condition  that  =  ri,y^'^(0)  =  Mi  and  /i^®^(0)  =  Mi.  On  behalf  of  Pgen[.]  for  an 

honest  Pi,  it  sends  (sid,  ji,  i, to  every  Pj  £  T.  Then  for  every  corrupted  Pi  £  T,  on  receiving 
(sid,  i,  (•),  5^*^  (•),  (•))  from  P^  (as  the  dealer),  5randZero[.],  on  behalf  of  Pgen[.i  ,  sends  (sid,  j,  i,  (0)]i)  to 

every  Pj  £  T,  after  verifying  the  polynomials  (■)>  "'Th  respect  to  the  corresponding  commitment 

Cri,ui,vi  (as  done  by  the  functionality  Pgen[.])-  If  the  polynomials  fails  the  test,  remove  Pi  from  T. 

-  It  locally  computes  [y  —  y]i  =  grip]*  ®tid  returns  (sid,  {[y  —  y]i}p;g'P\T)  to  <S/  and  halt. 


Fig.  22.  Simulator  for  PrandZeroI-] 
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Abstract.  We  present  a  computationally  secure  MPC  protocol  for  threshold  adversaries  which  is  parametrized 
by  a  value  L.  When  L  —  2  we  obtain  a  classical  form  of  MPC  protocol  in  which  interaction  is  required  for 
multiplications,  as  L  increases  interaction  is  reduced,  in  that  one  requires  interaction  only  after  computing  a  higher 
degree  function.  When  L  approaches  infinity  one  obtains  the  FHE  based  protocol  of  Gentry,  which  requires  no 
interaction.  Thus  one  can  trade  communication  for  computation  in  a  simple  way.  Our  protocol  is  based  on  an 
interactive  protocol  for  “bootstrapping”  a  somewhat  homomorphic  encryption  (SHE)  scheme.  The  key  contribution 
is  that  our  presented  protocol  is  highly  communication  efficient  enabling  us  to  obtain  reduced  communication  when 
compared  to  traditional  MPC  protocols  for  relatively  small  values  of  L. 


1  Introduction 

In  the  last  few  years  computing  on  encrypted  data  via  either  Fully  Homomorphic  Encryption  (FHE)  or  Multi-Party 
Computation  (MPC)  has  been  subject  to  a  remarkable  number  of  improvements.  Firstly,  FHE  was  shown  to  be  possible 
[29];  and  this  was  quickly  followed  by  a  variety  of  applications  and  performance  improvements  [9, 12, 1 1, 30, 31,  39, 
40].  Secondly,  whilst  MPC  has  been  around  for  over  thirty  years,  only  in  the  last  few  years  we  have  seen  an  increased 
emphasis  on  practical  instantiations;  with  some  very  impressive  results  [8, 22,  37]. 

We  focus  on  MPC  where  n  parties  wish  to  compute  a  function  on  their  respective  inputs.  Whilst  the  computational 
overhead  of  MPC  protocols,  compared  to  computing  “in  the  clear”,  is  relatively  small  (for  example  in  practical  proto¬ 
cols  such  as  [25, 37]  a  small  constant  multiple  of  the  “in  the  clear”  cost),  the  main  restriction  on  practical  deployment 
of  MPC  is  the  communication  cost.  Even  for  protocols  in  the  preprocessing  model,  evaluating  arithmetic  circuits  over 
Fp,  the  communication  cost  in  terms  of  number  of  bits  per  multiplication  gate  and  per  party  is  a  constant  multiple  of 
the  bit  length,  logp,  of  the  data  being  manipulated  for  a  typically  large  value  of  the  constant.  This  is  a  major  drawback 
of  MPC  protocols  since  communication  is  generally  more  expensive  than  computation.  Theoretical  results  like  [19] 
(for  the  computational  case)  and  [20]  (for  the  information  theoretic  case)  bring  down  the  per  gate  per  party  communi¬ 
cation  cost  to  a  very  small  quantity;  essentially  ■  log  \  C\  ■  logp)  bits  for  a  circuit  C  of  size  \C\.  While  these 

results  suggest  that  the  communication  cost  can  be  asymptotically  brought  down  to  a  constant  for  large  n,  the  constants 
are  known  to  be  large  for  any  practical  purpose.  Our  interest  lies  in  constructing  efficient  MPC  protocols  where  the 
efficiency  is  measured  in  terms  of  exact  complexity  rather  than  the  asymptotic  complexity. 

In  his  thesis,  Gentry  [28]  showed  how  FHE  can  be  used  to  reduce  the  communication  cost  of  MPC  down  to 
virtually  zero  for  any  number  of  parties.  In  Gentry’s  MPC  protocol  all  parties  encrypt  to  each  other  their  inputs  under  a 
shared  FHE  public  key.  They  then  compute  the  function  homomorphic  ally,  and  at  the  end  perform  a  shared  decryption. 
This  implies  an  MPC  protocol  whose  communication  is  limited  to  a  function  of  the  input  and  output  sizes,  and  not  to 
the  complexity  of  the  circuit.  However,  this  reduction  in  communication  complexity  comes  at  a  cost,  namely  the  huge 
expense  of  evaluating  homomorphically  the  function.  With  current  understanding  of  FHE  technology,  this  solution  is 
completely  infeasible  in  practice. 

A  variant  of  Gentry’s  protocol  was  presented  by  Asharov  et  al.  in  [1]  where  the  parties  outsource  their  computation 
to  a  server  and  only  interact  via  a  distributed  decryption.  The  key  innovation  in  [  1  ]  was  that  independently  generated 

This  article  is  based  on  an  earlier  article:  Asiacrypt  2013,  lACR  2013,  http://dx.doi.org/10.1007/ 
97 8- 3- 64 2- 42 04 5- 0_12. 
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(FHE)  keys  can  be  combined  into  a  “global”  FHE  key  with  distributed  decryption  capability.  We  do  not  assume  such 
a  functionality  of  the  keys  (but  one  can  easily  extend  our  results  to  accommodate  this);  instead  we  focus  on  using 
distributed  decryption  to  enable  efficient  multi-party  bootstrapping.  In  addition  the  work  of  [1],  in  requiring  an  EHE 
scheme,  as  opposed  to  the  SHE  scheme  of  our  work,  requires  the  assumption  of  circular  security  of  the  underlying 
EHE  scheme  (and  hence  more  assumptions). 

In  [25],  following  on  the  work  in  [7],  the  authors  propose  an  MFC  protocol  which  uses  an  SHE  scheme  as  an 
“optimization”.  Based  in  the  preprocessing  model,  the  authors  utilize  an  SHE  scheme  which  can  evaluate  circuits  of 
multiplicative  depth  one  to  optimize  the  preprocessing  step  of  an  essentially  standard  MFC  protocol.  The  optimiza¬ 
tions,  and  use  of  SHE,  in  [25]  are  focused  on  the  case  of  computational  improvements.  In  this  work  we  invert  the  use 
of  SHE  in  [25],  by  using  it  for  the  online  phase  of  the  MFC  protocol,  so  as  to  optimize  the  communication  efficiency 
for  any  number  of  parties. 

In  essence  we  interpolate  between  the  two  extremes  of  traditional  MFC  protocols  (with  high  communication  but 
low  computational  costs)  and  Gentry’s  FHE  based  solution  (with  high  computation  but  low  communication  costs). 
Our  interpolation  is  dependent  on  a  parameter,  which  we  label  as  L,  where  L  >  2.  At  one  extreme,  for  L  =  2  our 
protocol  resembles  traditional  MFC  protocols,  whilst  at  the  other  extreme,  for  L  =  oo  our  protocol  is  exactly  that 
of  Gentry’s  FHE  based  solution.  We  emphasize  that  our  construction  is  general  in  that  any  SHE  can  be  used  which 
supports  homomorphic  computation  of  depth  two  circuits  and  threshold  decryption.  Thus  the  requirements  on  the 
underlying  SHE  scheme  are  much  weaker  than  the  previous  SHE  (FHE)  based  MFC  protocols,  such  as  the  one  by 
Asharov  et  al.  [1],  which  relies  on  the  specifics  of  EWE  (learning  with  errors)  based  SHE  i.e.  key-homomorphism  and 
demands  homomorphic  computation  of  depth  L  circuits  for  big  enough  L  to  bootstrap. 

The  solution  we  present  is  in  the  preprocessing  model;  in  which  we  allow  a  preprocessing  phase  which  can  compute 
data  which  is  neither  input,  nor  function,  dependent.  This  preprocessed  data  is  then  consumed  in  the  online  phase.  As 
usual  in  such  a  model  our  goal  is  for  efficiency  in  the  online  phase  only.  We  present  our  basic  protocol  and  efficiency 
analysis  for  the  case  of  passive  threshold  adversaries  only;  i.e.  we  can  tolerate  up  to  t  passive  corruptions  where  t  <  n. 
We  then  note  that  security  against  t  active  adversaries  with  t  <  n/3  can  be  achieved  for  no  extra  cost  in  the  online 
phase.  For  the  active  security  case,  essentially  the  same  communication  costs  can  be  achieved  even  when  t  <  nl%  bar 
some  extra  work  (which  is  independent  of  IC'D  to  eliminate  the  cheating  parties  when  they  are  detected.  The  security 
of  our  protocols  are  proven  in  the  standard  UC  framework  [13]. 

We  note  that  our  focus  is  on  the  MFC  protocols  providing  robustness  and  fairness^ ,  which  is  impossible  to  achieve 
in  general  without  assuming  t  <  nj^  [15,  32].  Indeed  in  several  real-life  applications  it  may  be  desirable  to  have  these 
properties.  However  we  stress  that  we  could  deal  with  the  dishonest  majority  setting  (i.e.  f  <  n)  by  utilizing  additional 
zero-knowledge  proof  techniques  to  show  that  the  distributed  decryptions  are  performed  correctly;  however  as  our 
goal  is  to  achieve  low  exact  communication  complexity  (as  opposed  to  low  asymptotic  complexity)  we  feel  that  such 
a  discussion  would  deviate  from  the  thrust  of  our  work.  In  adding  the  corresponding  associated  proofs  of  correctness 
we  would  still  achieve  an  asymptotic  improvement  in  communication  complexity  over  other  MFC  protocols  with 
dishonest  majority;  but  this  is  not  our  focus  and  so  in  the  rest  of  the  paper,  we  avoid  discussing  about  the  setting  of 
dishonest  majority. 

Finally  we  note  that  our  results  on  communication  complexity,  both  in  a  practical  and  in  an  asymptotic  sense,  in 
the  computational  setting  are  comparable  (if  not  better)  than  the  best  known  results  in  the  information  theoretic  and 
computational  settings.  Namely  the  best  known  optimally  resilient  statistically  secure  MFC  protocol  with  t  <  n/2 
has  (asymptotic)  communication  complexity  of  0{n)  per  multiplication  [5],  whereas  ours  is  0{n/L)  (see  Section  9 
for  the  analysis  of  our  protocol).  With  near  optimal  resiliency  of  t<{\-  t)n,  the  best  known  perfectly  secure  MFC 
protocol  has  (asymptotic)  communication  complexity  of  O  (polylog  n)  per  multiplication  [20],  but  a  huge  constant 
is  hiding  under  the  O.  In  the  computational  settings,  with  near  optimal  resiliency  of  f  <  (^  —  e)n,  the  best  known 
MFC  protocol  has  (asymptotic)  communication  complexity  of  ©(polylog  n)  per  multiplication  [19],  but  again  a  huge 
constant  is  hiding  under  the  O.  All  these  protocols  can  not  win  over  ours  when  exact  communication  complexity  is 
compared  for  even  small  values  of  L. 


*  Informally  robustness  means  that  the  adversary  cannot  deny  the  honest  parties  from  obtaining  the  correct  output,  while  fairness 
guarantees  that  either  everyone  receives  the  output  or  no  one  obtains  the  output. 
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Overview:  Our  protocol  is  intuitively  simple.  We  first  take  an  L-levelled  SHE  scheme  (strictly  it  has  L  +  \  levels, 
but  can  evaluate  circuits  with  L  levels  of  multiplications)  which  possesses  a  distributed  decryption  protocol  for  the 
specific  access  structure  required  by  our  MFC  protocol.  We  assume  that  the  SHE  scheme  is  implemented  over  a  ring 
which  supports  N  embeddings  of  the  underlying  finite  field  Fp  into  the  message  space  of  the  SHE  scheme.  Almost  all 
known  SHE  schemes  support  such  packing  of  the  finite  field  into  the  plaintext  slots  in  an  SIMD  manner  [30,40];  and 
such  packing  has  been  crucial  in  the  implementation  of  SHE  in  various  applications  [21, 25,  31]. 

Clearly  with  such  a  setup  we  can  implement  Gentry’s  MFC  solution  for  circuits  of  multiplicative  depth  L.  All  that 
remains  is  how  to  “bootstrap”  from  circuits  with  multiplicative  depth  L  to  arbitrary  circuits.  The  standard  solution 
would  be  to  bootstrap  the  EHE  scheme  directly,  following  the  blueprint  outlined  in  Gentry’s  thesis.  However,  in  the 
case  of  applications  to  MFC  we  could  instead  utilize  a  protocol  to  perform  the  bootstrapping.  In  a  nutshell  that  is 
exactly  what  we  propose. 

The  main  issue  then  is  show  how  to  efficiently  perform  the  bootstrapping  in  a  distributed  manner;  where  efficiency 
is  measured  in  terms  of  computational  and  communication  performance.  Naively  performing  an  MFC  protocol  to 
execute  the  bootstrapping  phase  will  lead  to  a  large  communication  overhead,  due  to  the  inherent  overhead  in  dealing 
with  homomorphic  encryptions.  But  on  its  own  this  is  enough  to  obtain  our  asymptotic  interpolation  between  EHE 
and  MFC;  we  however  aim  to  provide  an  efficient  and  practical  interpolation.  That  is  one  which  is  efficient  for  small 
values  of  L.  It  turns  out  that  a  special  case  of  a  suitable  bootstrapping  protocol  can  be  found  as  a  sub-procedure  of  the 
MFC  protocol  in  [25].  We  extract  the  required  protocol,  generalise  it,  and  then  apply  it  to  our  MFC  situation. 

To  ease  exposition  we  will  not  utilize  the  packing  from  [30]  to  perform  evaluations  of  the  depth  L  sub-circuits; 
we  see  this  as  a  computational  optimization  which  is  orthogonal  to  the  issues  we  will  explore  in  this  paper.  In  any 
practical  instantiation  of  the  protocol  of  this  paper  such  a  packing  could  be  used,  as  described  in  [30],  in  evaluating  the 
circuit  of  multiplicative  depth  L.  However,  we  will  use  this  packing  to  perform  the  bootstrapping  in  a  communication 
efficient  manner. 

The  bootstrapping  protocol  runs  in  two  phases.  In  the  first  (offline)  phase  we  repeatedly  generate  sets  of  ciphertexts, 
one  set  for  each  party,  such  that  all  parties  learn  the  ciphertexts  but  only  the  given  party  learns  their  underlying 
messages  (which  are  assumed  to  be  packed).  The  offline  phase  can  be  run  in  either  a  passive,  covert  or  active  security 
model,  irrespective  of  the  underlying  access  structure  of  the  MFC  protocol  following  ideas  from  [22].  In  the  second 
(online)  phase  the  data  to  be  bootstrapped  is  packed  together,  a  random  mask  is  added  (computed  from  the  offline 
phase  data),  a  distributed  decryption  protocol  is  executed  to  obtain  the  masked  data  which  is  then  re-encrypted,  the 
mask  is  subtracted  and  then  the  data  is  unpacked.  All  these  steps  are  relatively  efficient,  with  communication  only 
being  required  for  the  distributed  decryption. 

To  apply  our  interactive  bootstrapping  method  efficiently  we  need  to  make  a  mild  assumption  on  the  circuit  being 
evaluated;  this  is  similar  to  the  assumptions  used  in  [19,20,26].  The  assumption  can  be  intuitively  seen  as  saying 
that  the  circuit  is  relatively  wide  enough  to  enable  packing  of  enough  values  which  need  to  be  bootstrapped  at  each 
respective  level.  We  expect  that  most  circuits  in  practice  will  satisfy  our  assumption,  and  we  will  call  the  circuits  which 
satisfy  our  requirement  “well  formed”. 

We  pause  to  note  that  the  ability  to  open  data  within  the  MFC  protocol  enables  one  to  perform  more  than  a  simple 
evaluation  of  an  arithmetic  circuit.  This  observation  is  well  known  in  the  MFC  community,  where  it  has  been  used 
to  obtain  efficient  protocols  for  higher  level  functions  [14, 18].  Thus  enabling  a  distributed  bootstrapping  also  enables 
one  to  produce  more  efficient  protocols  than  purely  EHE  based  ones. 

We  instantiate  our  protocol  with  the  BGV  scheme  [10]  and  obtain  sufficient  parameter  sizes  following  the  method¬ 
ology  in  [22,31].  Due  to  the  way  we  utilize  the  BGV  scheme  we  need  to  restrict  to  MFC  protocols  for  arithmetic 
circuits  over  a  finite  field  Fp,  with  p  =  1  (mod  m)  with  m  =  2  ■  N  and  N  =  2"^  for  some  r.  The  distributed  decryp¬ 
tion  method  uses  a  “smudging”  technique  which  means  that  the  modulus  used  in  the  BGV  scheme  needs  to  be  larger 
than  what  one  would  need  to  perform  just  the  homomorphic  operations.  Removing  this  smudging  technique,  and  hence 
obtaining  an  efficient  protocol  for  distributed  decryption,  for  any  SHE  scheme  is  an  interesting  open  problem;  with 
many  potential  applications  including  that  described  in  this  paper. 

We  show  that  even  for  a  very  small  value  of  L,  in  particular  L  =  5,  we  can  achieve  better  communication  effi¬ 
ciency  than  many  practical  MFC  protocols  in  the  preprocessing  model.  Most  practical  MFC  protocols  such  as  [8, 25, 
37]  require  the  transmission  of  at  least  two  finite  field  elements  per  multiplication  gate  between  each  pair  of  parties. 
In  [25]  a  technique  is  presented  which  can  reduce  this  to  the  transmission  of  an  average  of  three  field  elements  per 
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multiplication  gate  per  party  (and  not  per  pair  of  parties).  Note  the  models  in  [8]  (three  party,  one  passive  adversary) 
and  [25,  37]  (n  party,  dishonest  majority,  active  security)  are  different  from  ours  (we  assume  honest  majority,  active 
security);  but  even  mapping  these  protocols  to  our  setting  of  n  party  honest  majority  would  result  in  the  same  com¬ 
munication  characteristics.  We  show  that  for  relatively  small  values  of  L,  i.e.  L  >  8,  one  can  obtain  a  communication 
efficiency  of  less  than  one  field  element  per  gate  and  party  (details  available  in  Section  9). 

Clearly,  by  setting  L  appropriately  one  can  obtain  a  communication  efficiency  which  improves  upon  that  in  [19, 
20];  albeit  we  are  only  interested  in  communication  in  the  online  phase  of  a  protocol  in  the  preprocessing  model  whilst 
[19,20]  discuss  total  communication  cost  over  all  phases.  But  we  stress  this  is  not  in  itself  interesting,  as  Gentry’s 
FHE  based  protocol  can  beat  the  communication  efficiency  of  [19, 20]  in  any  case.  What  is  interesting  is  that  we  can 
beat  the  communication  efficiency  of  the  online  phase  of  practical  MFC  protocols,  with  very  small  values  of  L  indeed. 
Thus  the  protocol  in  this  paper  may  provide  a  practical  tradeoff  between  existing  MFC  protocols  (which  consume  high 
bandwidth)  and  FHE  based  protocols  (which  require  huge  computation). 

Our  protocol  therefore  enables  the  following  use-case:  it  is  known  that  SHE  schemes  only  become  prohibitively 
computationally  expensive  for  large  L;  indeed  one  of  the  reasons  why  the  protocols  in  [22, 25]  are  so  efficient  is 
that  they  restrict  to  evaluating  homomorphic  ally  circuits  of  multiplicative  depth  one.  With  our  protocol  parties  can 
a  priori  decide  the  value  of  L,  for  a  value  which  enables  them  to  produce  a  computationally  efficient  SHE  scheme. 
Then  they  can  execute  an  MFC  protocol  with  communication  costs  reduced  by  effectively  a  factor  of  L.  Over  time 
as  SHE  technology  improves  the  value  of  L  can  be  increased  and  we  can  obtain  Gentry’s  original  protocol.  Thus  our 
methodology  enables  us  to  interpolate  between  the  case  of  standard  MFC  and  the  eventual  goal  of  MFC  with  almost 
zero  communication  costs. 

2  Well  Formed  Circuits 

In  this  section  we  define  what  we  mean  by  well  formed  circuits,  and  the  pre-processing  which  we  require  on  our 
circuits.  We  take  as  given  an  arithmetic  circuit  C  defined  over  a  finite  field  Fp.  In  particular  the  circuit  C  is  a  directed 
acyclic  graph  consisting  of  edges  made  up  of  nj  input  wires,  no  output  wires,  and  nw  internal  wires,  plus  a  set  of 
nodes  being  given  by  a  set  of  gates  <G.  The  gates  are  divided  into  sets  of  Add  gates  and  Mult  gates  Gm,  with 
G  =  G.4  U  Gm,  with  each  Add/Mult  gate  taking  two  wires  (or  a  constant  value  in  Fp)  as  input  and  producing  one 
wire  as  output.  The  circuit  is  such  that  all  input  wires  are  open  on  their  input  ends,  and  all  output  wires  are  open  on 
their  output  ends,  with  the  internal  wires  being  connected  on  both  ends.  We  let  the  depth  of  the  circuit  d  be  the  length 
of  the  maximum  path  from  an  input  wire  to  an  output  wire.  Our  definition  of  a  well  formed  circuit  is  parametrized  by 
two  positive  integer  values  N  and  L. 

We  now  associate  inductively  to  each  wire  in  the  circuit  an  integer  valued  label  as  follows.  The  input  wires  are 
given  the  label  one;  then  all  other  wires  are  given  a  label  according  to  the  following  rule  (where  we  assume  a  constant 
input  to  a  gate  has  label  L) 

Label  of  output  wire  of  Add  gate  =  min(Label  of  input  wires), 

Label  of  output  wire  of  Mult  gate  =  min(Label  of  input  wires)  —  1. 

Thus  the  minimum  value  of  a  label  is  1  —  d  (which  is  negative  for  a  general  d).  Looking  ahead,  the  reason  for  starting 
with  an  input  label  of  one  is  when  we  match  this  up  with  our  MFC  protocol  this  will  result  in  low  communication 
complexity  for  the  input  stage  of  the  computation. 

We  now  augment  the  circuit,  to  produce  a  new  circuit  which  will  have  labels  in  the  range  [1, . . . ,  L],  by 
adding  in  some  special  gates  which  we  will  call  Refresh  gates;  the  set  of  such  gates  are  denoted  as  G^.  A  Refresh  gate 
takes  as  input  a  maximum  of  N  wires,  and  produces  as  output  an  exact  copy  of  the  specified  input  wires.  The  input 
requirement  is  that  the  input  wires  must  have  label  in  the  range  [1, . . . ,  L],  and  all  that  the  Refresh  gate  does  is  relabel 
the  labels  of  the  gate’s  input  wires  to  be  L.  At  the  end  of  the  augmentation  process  we  require  the  invariant  that  all 
wire  labels  in  are  then  in  the  range  [1, . . . ,  L],  and  the  circuit  is  now  essentially  a  collection  of  “sub-circuits”  of 
multiplicative  depth  at  most  L  —  1  glued  together  using  Refresh  gates.  However,  we  require  that  this  is  done  with  as 
small  a  number  of  Refresh  gates  as  possible. 

Definition  1  (Well  Formed  Circuit).  A  circuit  C  will  be  called  well  formed  if  the  number  of  Refresh  gates  in  the 
associated  augmented  circuit  (7®*^®  is  at  most 
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We  expect  that  “most”  circuits  will  be  well  formed  due  to  the  following  argument:  We  first  note  that  the  only  gates 
which  concern  us  are  multiplication  gates;  so  without  loss  of  generality  we  consider  a  circuit  C  consisting  only  of 
multiplication  gates.  The  circuit  has  d  layers,  and  let  the  width  of  C  (i.e.  the  number  of  gates)  at  layer  i  be  Wi. 
Consider  the  algorithm  to  produce  which  considers  each  layer  in  turn,  from  i  =  1  to  d  and  adds  Refresh  gates 
where  needed.  When  reaching  level  i  in  our  algorithm  to  produce  we  can  therefore  assume  (by  induction)  that  all 
input  wires  at  this  layer  have  labels  in  the  range  [1, . . . ,  L].  To  maintain  the  invariant  we  only  need  to  apply  a  Refresh 
operation  to  those  input  wires  which  have  label  one.  Let  pi  denote  the  proportion  of  wires  at  layer  i  which  have  label 
one  when  we  perform  this  process.  It  is  clear  that  the  number  of  required  Refresh  gates  which  we  will  add  into 
at  level  i  will  be  at  most  \2  ■  pi  ■  Wi  /N~\ ,  where  the  factor  of  two  comes  from  the  fact  that  each  multiplication  gate  has 
two  input  wires. 

Assuming  a  large  enough  circuit  we  can  assume  for  most  layers  that  this  proportion  pi  will  be  approximately  1  /L, 
since  wires  will  be  refreshed  after  their  values  have  passed  through  L  multiplication  gates.  So  summing  up  over  all 
levels,  the  expected  number  of  Refresh  gates  in  will  be: 


E 


2  •  Wi 

L-N 


2 

L-N 
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Note,  we  would  expect  that  for  most  circuits  this  upper  bound  on  the  number  of  Refresh  gates  could  be  easily  met. 
For  example  our  above  rough  analysis  did  not  take  into  account  the  presence  of  gates  with  fan-out  greater  than  one 
(meaning  there  are  less  wires  to  Refresh  than  we  estimated  above),  nor  did  it  take  into  account  utilizing  unused  slots 
in  the  Refresh  gates  to  refresh  wires  with  labels  not  equal  to  one. 

Determining  an  optimum  algorithm  for  moving  from  C  to  a  suitable  with  a  minimal  number  of  Refresh 

gates,  is  an  interesting  optimization  problem  which  we  leave  as  an  open  problem;  however  clearly  the  above  outlined 
greedy  algorithm  will  work  for  most  circuits. 


3  Threshold  X-Levelled  Packed  Somewhat  Homomorphic  Encryption  (SHE) 

In  this  section,  we  present  a  detailed  explanation  of  the  syntax  and  requirements  for  our  Threshold  L-Levelled  Packed 
Somewhat  Homomorphic  Encryption  Scheme.  The  scheme  will  be  parametrized  by  a  number  of  values;  namely  the 
security  parameter  k,  the  number  of  levels  L,  the  amount  of  packing  of  plaintext  elements  which  can  be  made  into 
one  ciphertext  N,  a  statistical  security  parameter  sec  (for  the  security  of  the  distributed  decryption)  and  a  pair  (f,  n) 
which  defines  the  threshold  properties  of  our  scheme.  In  practice  the  parameter  N  will  be  a  function  of  L  and  k.  The 
message  space  of  the  SHE  scheme  is  defined  to  be  AI  =  F^,  and  we  embed  the  finite  field  Fp  into  Ni  via  a  map 
X  '-  Fp  — >  Ni.  See  Section  7  for  a  discussion  as  to  the  various  choices  one  has  for  x  when  we  specialise  to  the  BGV 
SHE  scheme. 

Let  C(L)  denote  the  family  of  circuits  consisting  of  addition  and  multiplication  gates  whose  labels  follow  the  con¬ 
ventions  in  Section  2;  except  that  input  wires  have  label  L  and  whose  minimum  wire  label  is  zero.  Thus  C  (L)  is  the 
family  of  standard  arithmetic  circuits  of  multiplicative  depth  at  most  L  which  consist  of  2-input  addition  and  multi¬ 
plication  gates  over  Fp,  whose  wire  labels  lie  in  the  range  [0, . . . ,  L].  Informally,  a  threshold  L-levelled  SHE  scheme 
supports  homomorphic  evaluation  of  any  circuit  in  the  family  C{L)  with  the  provision  for  distributed  (threshold)  de¬ 
cryption,  where  the  input  wire  values  Vi  are  mapped  to  ciphertexts  (at  level  L)  by  encrypting 

As  remarked  in  the  introduction  we  could  also,  as  in  [30],  extend  the  circuit  family  C{L)  to  include  gates  which 
process  N  input  values  at  once  as 

A^-Add  ((mi,  . .  .,un),  {vi, . .  .,vn))  '-=  (ui  -Lvi,...,un  +  vn), 

A^-Mult((Mi,...,'UAr),(vi,...,VAr))  :=  {ui  X  Vi,...,Un  X  Vn). 

But  such  an  optimization  of  the  underlying  circuit  is  orthogonal  to  our  consideration.  However,  the  underlying  L- 
levelled  packed  SHE  scheme  supports  such  operations  on  its  underlying  plaintext  (we  will  just  not  consider  these 
operations  in  our  circuits  being  evaluated). 

We  can  evaluate  subcircuits  in  C(L);  and  this  is  how  we  will  describe  the  homomorphic  evaluation  below  (this 
will  later  help  us  to  argue  the  correctness  property  of  our  general  MFC  protocol).  In  particular  if  C  G  C{L),  we 
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can  deal  with  sub-circuits  of  C  whose  input  wires  have  labels  1“  ,  and  whose  output  wires  have  labels 

, . . . ,  1^“*^ ,  where  I*" ,  [°“*  G  [0, . . . ,  L] .  Then  given  ciphertexts  Ci , . . . ,  Ci.^  encrypting  the  messages  mi , . . . ,  , 

for  which  the  ciphertexts  are  at  level  I™ , . . . ,  1“  ,  the  homomorphic  evaluation  function  will  produce  ciphertexts 
Cl, at  levels  I™*, . . . ,  which  encrypt  the  messages  corresponding  to  evaluating  on  the  compo¬ 
nents  of  the  vectors  mi , . . . ,  m^  in  a  SIMD  manner.  More  formally: 

Definition  2  (Threshold  L-levelled  Packed  SHE).  An  L-levelled  public  key  packed  somewhat  homomorphic  en¬ 
cryption  (SHE)  scheme  with  the  underlying  message  space  A4  =  F^,  public  key  space  VIC,  secret  key  space  SIC, 
evaluation  key  space  SIC,  ciphertext  space  CT  and  distributed  decryption  key  space  VlCifor  i  G  [1, . . .  ,n\  is  a  collec¬ 
tion  of  the  following  PPT  algorithms,  parametrized  by  a  computational  security  parameter  k  and  a  statistical  security 
parameter  sec.' 

1.  SHE.KeyGen(l”,  n,  f)  (pk,  ek,  sk,  dki, . . . ,  dk„).'  The  key  generation  algorithm  outputs  a  public  key  pk  G 
VIC,  a  public  evaluation  key  ek  G  SIC,  a  secret  key  sk  G  SK,  and  n  keys  (dki, . . . ,  dk„)  for  the  distributed 
decryption,  with  dki  G  VKi. 

2.  SHE.EnCpi<;(m,  r)  (c,  L):  The  encryption  algorithm  computes  a  ciphertext  c  G  CT,  which  encrypts  a  plaintext 
vector  m  G  A)  under  the  public  key  pk  using  the  randomness^  r  and  outputs  (c,  L)  to  indicate  that  the  associated 
level  of  the  ciphertext  is  L. 

3.  SHE.DeCsk(c,  [)  — >  m'.'  The  decryption  algorithm  decrypts  a  ciphertext  c  G  CT  of  associated  level  1  where 
1  G  [0, . . . ,  T]  using  the  decryption  key  sk  and  outputs  a  plaintext  m'  G  M..  We  say  that  m!  is  the  plaintext 
associated  with  c. 

4.  SHE.ShareDeCdk- (c,  1)  — >  fii:  The  share  decryption  algorithm  takes  a  ciphertext  c  with  associated  level  1  G 
[0, . . . ,  L],  a  key  dki  for  the  distributed  decryption,  and  computes  a  decryption  share  pt  of  c. 

5.  SHE.ShareCombine((c,  1),  The  share  combine  algorithm  takes  a  ciphertext  c  with  associ¬ 

ated  level  I  G  [0, . . . ,  L]  and  a  set  of  n  decryption  shares  and  outputs  a  plaintext  m'  G  M.. 

6.  SHE.Evalek(C®“'^,  (ci,  li"), . . . ,  (cci„,  Q”^))  ^  (ci,  Ii“*),  ■  • . ,  (Qo„t  j  The  homomorphic  evaluation  algo¬ 

rithm  is  a  deterministic  polynomial  time  algorithm  (polynomial  in  L,£in,£out  ond  k)  that  takes  as  input  the 
evaluation  key  ek,  a  sub-circuit  of  a  circuit  C  G  C{L)  with  input  gates  and  £out  output  gates  as  well  as 

a  set  of  £  in  ciphertexts  Ci, . . . ,  with  associated  level  1“, . . . ,  1“^,  and  outputs  £out  ciphertexts  Ci, . . . , 
with  associated  levels  [i“*, . . . ,  1^“*^  respectively,  where  each  G  [0, . . . ,  L]  is  the  label  associated  to  the 

given  input/output  wire  in 

Algorithm  SHE.Eval  associates  the  input  ciphertexts  with  the  input  gates  ofC^^^  and  homomorphically  evaluates 
gate  by  gate  in  an  SIMD  manner  on  the  components  of  the  input  messages.  For  this,  SHE.Eval  consists 
of  separate  algorithms  SHE. Add  and  SHE. Mult /or  homomorphically  evaluating  addition  and  multiplication 
gates  respectively.  More  specifically,  given  two  ciphertexts  (ci,  li)  and  (c2, 12)  with  associated  levels  li  and  [2 
respectively  where  [1,(2  G  [0, . . . ,  L]  then^: 

-  SHE.Addek((ci,  li),  (c2, 12))  (cAdd,  niin  (li,  12)).'  The  deterministic  polynomial  time  addition  algorithm 
takes  as  input  (ci,  li),  (c2,  [2)  and  outputs  a  ciphertext  CAdd  with  associated  level  min  (li,  I2). 

-  SHE.Multek((ci,  li),  (c2,  [2))  ^  (cMuit)  niin  (li,  [2)  —  1).'  The  deterministic  polynomial  time  multiplication 
algorithm  takes  as  input  (ci,  li),  (c2, 12)  and  outputs  a  ciphertext  CMuit  with  associated  level  min  (li,  I2)  —  1. 

-  SHE.ScalarMultek((ci,  li))^)  — >  (cscaiar,  Ii)-'  The  deterministic  polynomial  time  scalar  multiplication  algo¬ 
rithm  takes  as  input  (ci,  li)  and  a  plaintext  a  G  Af  and  outputs  a  ciphertext  Cscaiar  with  associated  level 

h- 

7.  SHE.Packek((ci,  li), . . . ,  (cat,  (at))  — >  (c,  min(li, . . . ,  In))-'  If  is  a  ciphertext  with  associated  plaintext  xijrii), 
then  this  procedure  produces  a  ciphertext  (c,  min([i, . . . ,  In))  with  associated  plaintext  m  =  (mi, . . . ,  rriN). 

8.  SHE.Unpackg|^(c,  1)  — >  ((ci,  1), . . . ,  (cat,  [))•'  If  is  a  ciphertext  with  associated  plaintext  m  =  (mi, . . . ,  itin), 
then  this  procedure  produces  N  ciphertexts  (ci,  1), . . . ,  (cat,  I)  such  that  Ci  has  associated  plaintext  x(mi). 

^  In  the  paper,  unless  it  is  explicitly  specified,  we  assume  that  some  randomness  has  been  used  for  encryption. 

^  Without  loss  of  generality  we  assume  that  we  can  perform  homomorphic  operations  on  ciphertexts  of  different  levels,  since  we 
can  always  deterministically  downgrade  the  ciphertext  level  of  any  ciphertext  to  any  value  between  zero  and  its  current  value 
using  SHE.LowerLevelek. 
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9.  SHE.LowerLevelek((c,  I),  T)  ^  (c,  [');  This  procedure,  for  I'  <  I,  produces  a  ciphertext  with  the  same  associated 
plaintext  as  (c,  1),  but  at  level  □ 

We  require  the  following  homomorphic  property  to  be  satisfied: 

-  Somewhat  Homomorphic  SIMD  Property:  Let  :  Fp™  ^  sub-circuit  of  a  circuit  C  in  the  family 

C{L)  with  respective  inputs  mi, . . . ,  €  Af,  such  that  0^^^°  when  evaluated  N  times  in  an  SIMD  fashion  on 

the  N  components  of  the  vectors  mi, . . . ,  m^.^,  produces  N  sets  of  output  values  rhi, . . . ,  €  A4. 

Moreover,  for  i  G  [1, . .  •  ,f'm]  let  be  a  ciphertext  of  level  1*"  with  associated  plaintext  vector  m^  and  let 
(ci,  1°“*), . . . ,  =  SHE.Evalek(C'^^'^,  (ci,  1“), . . . ,  (c£.^,  Then  the  following  holds  with  prob¬ 

ability  one  for  each  i  G  [1,  ■  •  • , 

SHE.Dec,k(c;,r‘)  =  m,. 

We  also  require  the  following  security  properties: 

-  Key  Generation  Security:  Let  S  and  Di  be  the  random  variables  which  denote  the  probability  distribution  with 
which  the  secret  key  sk  and  the  ith  key  dk^  for  the  distributed  decryption  is  selected  from  SJC  and  VK-i  by 
SHE.KeyGen  for  z  =  1, . . . ,  n.  Moreover,  for  a  set  /  C  {1, . . . ,  n},  let  Dj  denote  the  random  variable  which 
denote  the  probability  distribution  with  which  the  set  of  keys  for  the  distributed  decryption,  belonging  to  the 
indices  in  /,  are  selected  from  the  corresponding  VICiS  by  SHE.KeyGen.  Then  the  following  two  properties  hold: 

•  Correctness:  For  any  set  I  C  {!,..., n}  with  |/|  >  t  +  1,  H{S\Di)  =  0.  Here  H{X\Y)  denotes  the 
conditional  entropy  of  a  random  variable  X  with  respect  to  a  random  variable  Y  [16]. 

•  Privacy:  For  any  set  I  C  {1, . . . ,  n}  with  |/|  <  t,  H{S\Di)  =  H{S). 

-  Semantic  Security:  For  every  set  /  C  {1, . . . ,  n}  with  |/|  <  t  and  all  PPT  adversaries  A,  the  advantage  of  A  in 
the  following  game  is  negligible  in  k: 

•  Key  Generation:  The  challenger  runs  SHE.KeyGen(l”,  1“^,  n,  f)  to  obtain  (pk,  ek,  sk,  dki, . . . ,  dk„)  and  sends 
pk,  ek  and  {dkijjg/  to  A. 

•  Challenge:  A  sends  plaintexts  mo,  mi  G  AI  to  the  challenger,  who  randomly  selects  b  G  {0, 1}  and  sends 
(c,  L)  —  SHE.EnCpk(mf,,  r)  for  some  randomness  r  to  A. 

•  Output:  A  outputs  b'. 

The  advantage  of  A  in  the  above  game  is  defined  to  be  |  ^  —  Pr[&'  =  fe]  | . 

-  Correct  Share  Decryption:  For  any  (pk,  ek,  sk,  dki, . . . ,  dk„)  obtained  as  the  output  of  SHE.KeyGen,  the  following 
should  hold  for  any  ciphertext  (c,  1)  with  associated  level  I  G  [0, . . . ,  Lj: 

SHE.Dec5k(c,  I)  =  SHE.ShareCombine((c,  1),  {SHE.ShareDecoki  (c,  0}i6[i,...,n])- 

-  Share  Simulation  Indistinguishability:  There  exists  a  PPT  simulator  SHE.ShareSim,  which  on  input  a  subset 
/  C  {1, ...  ,n}  of  size  at  most  t,  a  ciphertext  (c,  I)  of  level  1  G  [0, . . .  ,L],  a  plaintext  m  and  |/|  decryp¬ 
tion  shares  {p,i}i^j  outputs  n  —  |/|  simulated  decryption  shares  {p,*}j^j  with  the  following  property:  For  any 
(pk,  ek,  sk,  dki,  ■  •  ■ ,  dk„)  obtained  as  the  output  of  SHE.KeyGen,  any  subset  /  C  {1, . . . ,  n}  of  size  at  most  t,  any 
m  G  A1  and  any  (c,  1)  where  m  =  SHE.DeCsk(c,  I),  the  following  distributions  are  statistically  indistinguishable: 

({/fJi67,SHE.ShareSim((c,  [),m,  {/fjie/))  «  ({/ijie/,  , 

where  for  all  i  G  [1, . . .  ,n],  p,i  =  SHE.ShareDeCdk; (c,  1).  We  require  in  particular  that  the  statistical  distance 
between  the  two  distributions  is  bounded  by  2““^.  Moreover 

SHE.ShareCombine((c,  1),  U  SHE.ShareSim((c,  I),  m, 

outputs  the  result  m.  Here  /  denotes  the  complement  of  the  set  /;  i.e.  /  =  {1, . . .  ,n}\  I. 

In  Section  7  we  instantiate  the  abstract  syntax  with  a  threshold  SHE  scheme  based  on  the  BGV  scheme  [10].  We 
pause  to  note  the  difference  between  our  underlying  SHE,  which  is  just  an  SHE  scheme  which  supports  distributed 
decryption,  and  that  of  [1]  which  requires  a  special  key  homomorphic  EHE  scheme. 
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4  MPC  from  SHE  -  The  Semi-honest  Settings 


In  this  section  we  present  our  generic  MPC  protocol  for  the  computation  of  any  arbitrary  depth  d  circuit  using  an 
abstract  threshold  L-levelled  SHE  scheme.  For  the  ease  of  exposition  we  first  concentrate  on  the  case  of  semi-honest 
security,  and  then  we  deal  with  active  security  in  Section  5. 


Functionality  Tj 

Tf  interacts  with  the  parties  Pi, ...  ,Pn  and  the  adversary  S  and  is  parametrized  by  an  n-input  function  /  :  Fp  ^  Fp. 

-  Upon  receiving  (sid,  i,  Xi)  from  the  party  Pi  for  every  i  £  [1, . . . ,  n]  where  Xi  £  Fp,  compute  y  =  C{xi, . . . ,  x„),  send 
(sid,  y)  to  all  the  parties  and  the  adversary  S  and  halt.  Here  C  denotes  the  (publicly  known)  well  formed  arithmetic  circuit 
over  Fp  representing  the  function  /. 


Fig.  1.  The  Ideal  Functionality  for  Computing  a  Given  Function 


Without  loss  of  generality  we  make  the  simplifying  assumption  that  the  function  /  to  be  computed  takes  a  single 
input  from  each  party  and  has  a  single  output;  specifically  /  :  ^  Fp.  The  ideal  functionality  !Ff  presented  in 

Figure  1  computes  such  a  given  function  /,  represented  by  a  well  formed  circuit  C.  We  will  present  a  protocol  to  realise 
the  ideal  functionality  (Fy  in  a  hybrid  model  in  which  we  are  given  access  to  an  ideal  functionality  (FsetupGen  which 
implements  a  distributed  key  generation  for  the  underlying  SHE  scheme.  In  particular  the  (FsetupGen  functionality 
presented  in  Figure  2  computes  the  public  key,  secret  key,  evaluation  key  and  the  keys  for  the  distributed  decryption 
of  an  L-levelled  SHE  scheme,  distributes  the  public  key  and  the  evaluation  key  to  all  the  parties  and  sends  the  ith  key 
dki  (for  the  distributed  decryption)  to  the  party  Pi  for  each  i  £  [1, . . . ,  n].  In  addition,  the  functionality  also  computes 
a  random  encryption  Ci  with  associated  plaintext  1  =  (1, . . . ,  1)  G  A4  and  sends  it  to  all  the  parties.  Looking  ahead. 
Cl  will  be  required  while  proving  the  security  of  our  MPC  protocol.  The  ciphertext  Ci  is  at  level  one,  as  we  only  need  it 
to  pre-multiply  the  ciphertexts  which  are  going  to  be  decrypted  via  the  distributed  decryption  protocol;  thus  the  output 
of  a  multiplication  by  C]^  need  only  be  at  level  zero.  Looking  ahead,  this  ensures  that  (with  respect  to  our  instantiation 
of  SHE)  the  noise  is  kept  to  a  minimum  at  this  stage  of  the  protocol. 


Functionality  L'setupgen 

.FsetupGen  interacts  with  the  parties  Pi, ...,  P„  and  the  adversary  S  and  is  parametrized  by  an  L-levelled  SHE  scheme. 

-  Upon  receiving  (sid,i)  from  the  party  Pi  for  every  i  £  [l,...,n],  compute  (pk,  ek,  sk,  dki, . . . ,  dk„)  = 

SHE.KeyGen(l'‘,  n,  f)  and  (c^,  1)  =  SHE.LowerLevelek((SHE.EnCpk(l,  r),  1)  for  1  =  (1, . . . ,  1)  £  A4  and 
some  randomness  r.  Finally  send  (sid,  pk,  ek,  dk;,  (cj^,  1))  to  the  party  Pi  for  every  i  £  [1, . . . ,  n]  and  halt. 


Fig.  2.  The  Ideal  Functionality  for  Key  Generation 


4.1  The  MPC  Protocol  in  the  :FsetupGen -hybrid  Model 

Here  we  present  our  MPC  protocol  in  the  LsetupGen -hybrid  model.  Let  C  be  the  (well  formed)  arithmetic  circuit 
representing  the  function  /  and  be  the  associated  augmented  circuit  (which  includes  the  necessary  Refresh  gates). 
The  protocol  U™  (see  Figure  3)  runs  in  two  phases:  offline  and  online.  The  computation  performed  in  the  offline  phase 
is  completely  independent  of  the  circuit  and  (private)  inputs  of  the  parties  and  therefore  can  be  carried  out  well  ahead 
of  the  time  (namely  the  online  phase)  when  the  function  and  inputs  are  known.  If  the  parties  have  more  than  one 
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input/output  then  one  can  apply  packing/unpacking  at  the  input/output  stages  of  the  protocol;  we  leave  this  minor 
modification  to  the  reader. 

In  the  offline  phase,  the  parties  interact  with  .FsetupGen  to  obtain  the  public  key,  evaluation  key  and  their  respective 
keys  for  performing  distributed  decryption,  corresponding  to  a  threshold  L-levelled  SHE  scheme.  Next  each  party 
sends  encryptions  of  C  random  elements  and  then  additively  combines  them  (by  applying  the  homomorphic  addition 
to  the  ciphertexts  encrypting  the  random  elements)  to  generate  (  ciphertexts  at  level  L  of  truly  random  elements 
(unknown  to  the  adversary).  Here  (  is  assumed  to  be  large  enough,  so  that  for  a  typical  circuit  it  is  more  than  the 
number  of  refresh  gates  in  the  circuit,  i.e.  C  >  ^R-  Looking  ahead,  these  random  ciphertexts  created  in  the  offline 
phase  are  used  in  the  online  phase  to  evaluate  refresh  gates  by  (homomorphically)  masking  the  messages  associated 
with  the  input  wires  of  a  refresh  gate. 

During  the  online  phase,  the  parties  encrypt  their  private  inputs  and  distribute  the  corresponding  ciphertexts  to  all 
other  parties.  These  ciphertexts  are  transmitted  at  level  one,  thus  consuming  low  bandwidth,  and  are  then  elevated  to 
level  L  by  the  use  of  a  following  Refresh  gate  (which  would  have  been  inserted  by  the  circuit  augmentation  process). 
Note  that  the  inputs  of  the  parties  are  in  Fp  and  so  the  parties  first  apply  the  mapping  x  (embedding  Fp  into  the  message 
space  Ai  of  SHE)  before  encrypting  their  private  inputs. 

The  input  stage  is  followed  by  the  homomorphic  evaluation  of  as  follows:  The  addition  and  multiplication 
gates  are  evaluated  locally  using  the  addition  and  multiplication  algorithm  of  the  SHE.  Eor  each  refresh  gate,  the  parties 
execute  the  following  protocol  to  enable  a  “multiparty  bootstrapping”  of  the  input  ciphertexts:  the  parties  pick  one  of 
the  random  ciphertext  created  in  the  offline  phase  (for  each  refresh  gate  a  different  ciphertext  is  used)  and  perform 
the  following  computation  to  refresh  N  ciphertexts  with  levels  in  the  range  [1, . . . ,  L]  and  obtain  N  fresh  level  L 
ciphertexts,  with  the  associated  messages  unperturbed: 

-  Let  (ci,  li), . . . ,  (cat,  Iat)  be  the  N  ciphertexts  with  associated  plaintexts  i  x{^n)  with  every  Zi  G  Fp, 

that  need  to  be  refreshed  (i.e.  they  are  the  inputs  of  a  refresh  gate). 

-  The  iV  ciphertexts  are  then  (locally)  packed  into  a  single  ciphertext  c,  which  is  then  homomorphically  masked 
with  a  random  ciphertext  from  the  offline  phase. 

-  The  resulting  masked  ciphertext  is  then  publicly  opened  via  distributed  decryption.  This  allows  for  the  creation  of 
a  fresh  encryption  of  the  opened  value  at  level  L. 

-  The  resulting  fresh  encryption  is  then  homomorphically  unmasked  so  that  its  associated  plaintext  is  the  same  as 
original  plaintext  prior  to  the  original  masking. 

-  This  fresh  (unmasked)  ciphertext  is  then  unpacked  to  obtain  iV  fresh  ciphertexts,  having  the  same  associated 
plaintexts  as  the  original  A  ciphertexts  Ci  but  at  level  L. 

By  packing  the  ciphertexts  together  we  only  need  to  invoke  distributed  decryption  once,  instead  of  iV  times.  This  leads 
to  a  more  communication  efficient  online  phase,  since  the  distributed  decryption  is  the  only  operation  that  demands 
communication.  Without  affecting  the  correctness  of  the  above  technique,  but  to  ensure  security,  we  add  an  additional 
step  while  doing  the  masking:  the  parties  homomorphically  pre-multiply  the  ciphertext  c  with  Ci  before  masking. 
Recall  that  Ci  is  an  encryption  of  1  G  j\4  generated  by  J^setupGen  and  so  by  doing  the  above  operation,  the  plaintext 
associated  with  c  remains  the  same.  During  the  simulation  in  the  security  proof,  this  step  allows  the  simulator  to  set  the 
decrypted  value  to  the  random  mask  (irrespective  of  the  circuit  inputs),  by  playing  the  role  of  J^setupGen  and  replacing 
Cl  with  Co,  a  random  encryption  of  0  =  (0, . . . ,  0).  Furthermore,  this  step  explains  the  reason  why  we  made  provision 
for  an  extra  multiplication  during  circuit  augmentation  by  insisting  that  the  refresh  gates  take  inputs  with  labels  in 
[1, ...  ,L],  instead  of  [0, . . . ,  L];  the  details  are  available  in  the  simulation  proof  of  security  of  our  MFC  protocol. 

Finally,  the  function  output  y  is  obtained  by  another  distributed  decryption  of  the  output  ciphertext.  However,  this 
step  is  also  not  secure  unless  the  ciphertext  is  randomized  again  by  pre-multiplication  by  Ci  and  adding  n  encryptions 
of  0  where  each  party  contributes  one  encryption.  In  the  simulation,  the  simulator  gives  encryption  of  xiv)  on  behalf 
of  one  honest  party  and  replaces  Ci  by  Co,  letting  the  output  ciphertext  correspond  to  the  actual  output  y,  even  though 
the  circuit  is  evaluated  with  zero  as  the  inputs  of  the  honest  parties  during  the  simulation  (the  simulator  will  not  know 
the  real  inputs  of  the  honest  parties  and  thus  will  simulate  them  with  zero).  A  similar  idea  was  also  used  in  [23];  details 
can  be  found  in  the  security  proof. 

Intuitively,  privacy  follows  because  at  any  stage  of  the  computation,  the  keys  of  the  honest  parties  for  the  distributed 
decryption  are  not  revealed  and  so  the  adversary  will  not  be  able  to  decrypt  any  intermediate  ciphertext.  Correctness 
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Protocol  77“ 

Let  denote  an  augmented  circuit  for  a  well  formed  circuit  C  over  Fp  representing  /  and  let  SHE  be  a  threshold  7-levelled 
SHE.  Moreover,  let  V  =  {Pi , . . . ,  P„}  be  the  set  of  n  parties  For  the  session  ID  sid  the  parties  do  the  following: 

Offline  Computation:  Every  party  P;  €  P  does  the  following: 

-  Call  PsetupGen  with  (sid,  i)  and  receive  (sid,  pk,  ek,  dki,  (c^,  1)). 

-  Randomly  select  C,  plaintexts  mi,i, . . . ,  £  M,  and  compute  =  SHE.EnCpk(mi^fe,  Send 

(sid,  i,  (cmi  i ,  7), . . . ,  (cinip;,  7))  to  all  parties  in  P. 

-  Upon  receiving  (sid,  j,  (cm_,  i ,  7), . . . ,  (cm,  ^ ,  7))  from  all  parties  Pj  £  P,  apply  SHE. Add  for  1  <  fc  <  C,  on 

(cmi  ^ ,  7), . . . ,  (cm„  ,  7),  set  the  resultant  ciphertext  as  the  feth  offline  ciphertext  Cnn,  with  the  (unknown)  associated  plain¬ 
text  rrifc  =  mi,fc  H - +  m„,fc. 

Online  Computation:  Every  party  Pi  £  P  does  the  following: 

-  Input  Stage:  On  having  input  Xi  £  Fp,  compute  (cx^,  1)  =  SHE.LowerLevelek(SHE.EnCpk(x(a:i)!  d),  1)  with  randomness 
Ti  and  send  (sid,  i,  (cx; ,  1))  to  each  party.  Receive  (sid,  j,  (cx^ ,  1))  from  each  party  Pj  £  P. 

-  Computation  Stage:  Associate  the  ciphertexts  received  with  the  corresponding  input  wires  of  C^''®  and  then  homomorphically 
evaluate  the  circuit  gate  by  gate  as  follows: 

•  Addition  Gate  and  Multiplication  Gate:  Given  (ci,  li)  and  (c2,  [2)  associated  with  the  input  wires  of  the  gate  where 
li,  b  £  [1,  •  •  • ,  7],  locally  compute  (c,  1)  =  SHE.Addek((ci,  li),  (c2,  [2))  with  I  =  min  (L,  b)  for  an  addition  gate  and 
(c,  1)  =  SHE.Multek((ci,  b),  (c2,  b))  with  I  =  min  (li,  I2)  —  1  for  a  multiplication  gate;  for  the  multiplication  gate, 
b,  b  £  [2, . . . ,  7],  instead  of  [1, ... ,  7].  Associate  (c,  I)  with  the  output  wire  of  the  gate. 

•  Refresh  Gate:  For  the  fcth  refresh  gate  in  the  circuit,  the  fcth  offline  ciphertext  (Cm^ ,  7)  is  used.  Let  (ci,b),---,(cjv,liv) 
be  the  ciphertexts  associated  with  the  input  wires  of  the  refresh  gate  where  b,  •  •  • ,  Iat  £  [1, . . . ,  7]: 

*  Packing:  Locally  compute  (cz,  1)  =  SHE.Packek({(ci,  b)}ig[i,...,jv])  where  I  =  min  (b,  •  •  • ,  liv). 

*  Masking:  Locally  compute  (cz+nn, ,  0)  =  SHE.Addek(SHE.Multek((cz,  1),  (g,  1)),  (cnn,,  L)) 

*  Decrypting:  Locally  compute  the  decryption  share  /ii  =  SHE.ShareDeCdk;  (cz+nn, ,  0)  and  send  {s\d, 

to  every  other  party.  On  receiving  {s\d,  j,p,j)  from  every  Pj  £  P,  compute  the  plaintext  z  -|-  uik  = 
SHE.ShareCombine((cz+nifc,0),  {p.j}jeli,...,n])- 

*  Re-encrypting:  Locally  re-encrypt  z  -|-  itik  by  computing  (cz+m;.,  7)  =  SHE.EnCpk(z  -|-  m*,,  r)  using  a  publicly 
known  (common)  randomness  r,  (This  can  simply  be  the  zero  string  for  our  BGV  instantiation,  we  only  need  to  map 
the  known  plaintext  into  a  ciphertext  element). 

*  Unmasking:  Locally  subtract  (cm;, ,  7)  from  (cz+mk ,  7)  to  obtain  (cz,  7). 

*  Unpacking:  Locally  compute  (ci,  7), . . . ,  (cat,  7)  =  SHE.Unpackgk(cz,  7)  and  associate  (ci,  7), . . . ,  (cAt,  7)  with 
the  output  wires  of  the  refresh  gate. 

-  Output  Stage:  Let  (c,  1)  be  the  ciphertext  associated  with  the  output  wire  of  where  I  £  [1, . . . ,  7]. 

•  Randomization:  Compute  a  random  encryption  (ci,  7)  =  SHE.EnCpk(0,  r')  of  0  =  (0, . . . ,  0)  and  send  (sid,  i,  (ci,  7)) 
to  every  other  party.  On  receiving  (sid,  j,  (c^,  7))  from  every  Pj  £  P,  apply  SHE. Add  on  {(cj,  7)}jg[i  to  obtain 
(co,  7).  Compute  (c,  0)  =  SHE.Addek(SHE.Multek((c,  1),  (ci,  1)),  (co,  7)). 

•  Output  Decryption:  Compute  7^  =  SHE.ShareDecdk;  (c,  0)  and  send  (sid,  i,  7i)  to  every  party.  On  receiving  (sid,  7i) 
from  every  Pj  £  P,  compute  y  =  SHE.ShareCombine((c,  0),  {7j}jG[i....,ni),  output  y  and  halt,  where  y  =  X~^(y). 


Fig.  3.  The  Protocol  for  Realizing  Pf  against  a  Semi-Honest  Adversary  in  the  PsetupGen -hybrid  Model 
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follows  from  the  properties  of  the  SHE  and  the  fact  that  the  level  of  each  ciphertext  in  the  protocol  remains  in  the 
range  [1, . . . ,  L],  thanks  to  the  refresh  gates.  So  even  though  the  circuit  C  may  have  any  arbitrary  depth  d  >  L,  we  can 
homomorphically  evaluate  C  using  an  L-levelled  SHE. 

We  work  in  the  standard  Universal  Composability  (UC)  framework  of  Canetti  [13],  with  static  corruption.  The  UC 
framework  introduces  a  PPT  environment  Z  that  is  invoked  on  the  security  parameter  1"^  and  an  auxiliary  input  z  and 
oversees  the  execution  of  a  protocol  in  one  of  the  two  worlds.  The  “ideal”  world  execution  involves  dummy  parties 
Pi,. . . ,  Pn,  an  ideal  adversary  S  who  may  corrupt  some  of  the  dummy  parties,  and  a  functionality  T .  The  “real”  world 
execution  involves  the  PPT  parties  Pi, ...  ,Pn  and  a  real  world  adversary  A  who  may  corrupt  some  of  the  parties.  In 
either  of  these  two  worlds,  a  PPT  adversary  can  corrupt  t  parties  out  of  the  n  parties.  The  environment  Z  chooses  the 
input  of  the  parties  and  may  interact  with  the  ideal/real  adversary  during  the  execution.  At  the  end  of  the  execution,  it 
has  to  decide  upon  and  output  whether  a  real  or  an  ideal  world  execution  has  taken  place. 

We  let  IDEALjp  5  2(1^;  denote  the  random  variable  describing  the  output  of  the  environment  Z  after  in¬ 
teracting  with  the  ideal  execution  with  adversary  S,  the  functionality  T,  on  the  security  parameter  and  2.  Let 
IDEAL2-.5,2  denote  the  ensemble  {IDEALjc- 5  2)}ft;eN,zG{o,i}* ■  Similarly  let  REAL77,.4,2(1'^,  2)  denote  the 

random  variable  describing  the  output  of  the  environment  Z  after  interacting  in  a  real  execution  of  a  protocol  11  with 
adversary  A,  the  parties  V,  on  the  security  parameter  1'^  and  z.  Let  REAL77 ,4  2  denote  the  ensemble  {REAL/j  ,4  2 
^)}KeN.ze{o,i}*- 

Definition  3.  For  n  G  N,  let  T  be  an  n-ary  functionality  and  let  11  be  an  n-party  protocol.  We  say  that  11  securely 
realizes  T  if  for  every  PPT  real  world  adversary  A  there  exists  a  PPT  ideal  world  adversary  S,  corrupting  the  same 
parties,  such  that  the  following  two  distributions  are  computationally  indistinguishable: 

IDE AL^-, 5.2  ~  REAL/7,.4.2- 

We  consider  the  above  definition  where  it  quantifies  over  different  adversaries:  passive  or  active,  that  corrupts  only 
certain  number  of  parties. 

Theorem  1.  Let  /  :  ^  Fp  he  a  function  over¥p  represented  by  a  well  formed  arithmetic  circuit  C  of  depth  d  over 

Fp.  Let  Pf  (presented  in  Figure  1)  be  the  ideal  functionality  computing  f  and  let  SHE  be  a  threshold  L-levelled  SHE 
scheme.  Then  the  protocol  Hj^  UC-secure  realizes  F f  against  a  static,  semi-honest  adversary  A,  corrupting  upto 
t  <  n  parties  in  the  FsETv?GEt:-hybrid  Model. 

Proof.  We  prove  the  theorem  with  respect  to  a  generic  L-levelled  SHE  scheme  and  first  consider  the  correctness. 
Suppose  in  the  protocol  party  Pi  has  input  cci  G  Fp.  Then  we  claim  the  following  invariant  to  hold  for  each  wire  w  of 
the  circuit  during  the  execution  of  the  protocol:  if  (c,  1)  is  the  ciphertext  associated  with  w  during  the  execution 
of  the  protocol  where  level  1  G  [1, . . . ,  L],  then  SHE.DeCsk(c,  1)  =  x(z),  where  z  G  Fp  is  the  value  that  would  have 
been  associated  with  w  during  the  evaluation  of  with  input  x  =  (a;i, . . . ,  a;„).  Before  proving  the  claim,  we  first 
recall  that  due  to  the  introduction  of  the  Refresh  gates  in  and  the  way  circuit  is  evaluated,  every  wire  in  the  circuit 
has  label  in  the  range  [1, . . . ,  L]  and  the  corresponding  ciphertext  associated  with  the  wire  (during  the  protocol 
execution)  has  level  in  the  range  [1, . . . ,  Lj.  In  addition  the  level  of  the  ciphertext  associated  to  a  wire  is  equal  to  the 
label  of  the  wire. 

Our  invariant  is  clearly  true  for  the  input  wires.  Assuming  that  the  evaluation  of  the  refresh  gates  is  correct,  the 
invariant  is  also  true  for  the  output  of  the  Refresh  gates.  That  the  invariant  holds  for  the  rest  of  the  circuit  follows  from 
the  homomorphic  property  of  the  SHE  scheme.  Einally,  the  correctness  of  the  refresh  gate  evaluation  follows  from  the 
correctness  of  SHE. Pack,  SHE. Unpack,  the  homomorphic  of  the  underlying  SHE;  and  the  fact  that  all  the  ciphertexts 
that  are  used  in  evaluating  a  refresh  gate  have  levels  in  the  range  [0, . . . ,  Lj. 

We  next  prove  the  security.  Let  ^  be  a  real-world  semi-honest  adversary  corrupting  t  <  n  parties  and  let  T  C 
V  denote  the  set  of  corrupted  parties.  We  now  present  an  ideal-world  adversary  (simulator)  for  A  in  Eigure  4. 
The  high  level  idea  for  the  simulator  is  the  following:  the  simulator  takes  the  input  {xijp.gT  and  interacts  with  Ff  to 
obtain  the  function  output  y.  The  simulator  then  invokes  A  with  the  inputs  {xi}p.^T  and  simulates  each  message  that 
A  would  have  received  in  the  protocol  Hj^  from  the  honest  parties  and  from  the  functionality  LsetupGen,  stage  by 
stage. 
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Simulator 

Let  SHE  be  an  L-levelled  SHE  scheme.  The  simulator  plays  the  role  of  the  honest  parties  and  simulates  each  step  of  the 
protocol  77“  as  follows.  The  communication  of  the  Z  with  the  adversary  A  is  handled  as  follows:  Every  input  value  re¬ 
ceived  by  the  simulator  from  Z  is  written  on  ^’s  input  tape.  Likewise,  every  output  value  written  by  A  on  its  output  tape  is 
copied  to  the  simulator’s  output  tape  (to  be  read  by  the  environment  Z).  The  simulator  then  does  the  following  for  the  session  ID  sid: 

Offline  Computation: 

-  On  receiving  the  message  (sid,i)  to  T^setupGen  from  A  for  each  Pi  €  T,  the  simulator  invokes  (pk,  ek,  sk, 
dki, . . . ,  dk„)  =  SHE.KeyGen(l'^,  n,  t),  computes  (cp,  1)  =  SHE.LowerLevelek(SHE.EnCpk(0,  Oil)"  on  the  behalf 
of  T^setupGen  sends  (sid,  pk,  ek,  {dkijPieT,  (cp,  1))  to  A. 

-  Eor  each  Pj  ^  T,  the  simulator  computes  (cm  ^.,7)  =  SHE.EnCpk(tnj_fe,  •)  for  k  €  [1, . . .  ,^]  for  a  randomly  chosen 
m.j,k  £  -M  and  sends  (sid,  j,  (cm^  i ,  L), . . . ,  {cm^  ^ ,  L))  to  A  on  the  behalf  of  the  honest  parties. 

-  On  receiving  (sid,  i,  (cm;  ^ ,  7), . . . ,  (cmi  ,  7))  from  A  for  every  Pi  €  T,  the  simulator  decrypts  the  ciphertexts  to  get 

their  associated  plaintexts  m;,!, . . . ,  i.e.  rui^k  =  SHE.DeCsk(cmi  ^ ,  7).  The  simulator  then  applies  SHE. Add  on 

(cmi  ^ ,  7), . . . ,  (cm„  i, ,  7)  and  sets  the  resultant  ciphertext  as  the  fcth  offline  ciphertext.  Eurthermore  it  sets  rrifc  =  mi,*,  -|- 
. . .  +  as  the  feth  offline  plaintext. 

Online  Computation: 

-  Input  Stage:  For  every  party  Pj  £  7"  \  T,  the  simulator  computes  a  random  encryption  (cx^ ,  1)  = 

SHE.LowerLevelek(SHE.EnCpk(x(0),  -(i  1)  sends  (sid,y,  (cxj ,  1))  to  A  on  the  behalf  of  every  Pj  G  V  \  T.  The  sim¬ 

ulator  receives  (sid,i,  (cx;,  1))  from  A  and  obtains  the  associated  plaintext  x^.  On  the  behalf  of  the  parties  Pi  £  T,  the 
simulator  sends  (sid,  i,  Xi)  to  the  functionality  Pf  and  receives  y,  where  Xi  =  x~^(xi)  £  Fp, 

-  Computation  Stage:  The  simulator  performs  the  local  computation  (required  for  the  addition,  multiplication  and  refresh  gates) 
as  specified  in  the  protocol  in  order  to  be  synchronized  with  the  adversary  with  respect  to  the  ciphertexts  associated  with  the 
wires  in  the  circuit.  For  the  refresh  gates,  the  simulator  simulates  to  A  the  communication  from  the  honest  parties  as  follows: 

•  Refresh  Gate:  Let  this  be  the  fcth  refresh  gate  and  let  (cnn, ,  7)  be  the  feth  offline  ciphertext  with  the  associated  plaintext 
uik,  which  are  known  to  the  simulator  while  simulating  the  offline  computation.  Let  (c,  0)  be  the  ciphertext  obtained  after 
the  masking  operation.  Since  co^  is  replaced  by  cp  in  the  simulation,  c  is  associated  with  message  irifc.  For  each  Pi  £  T, 
on  receiving  (sid,  i,  fli)  from  A  as  the  decryption  shares  of  (c,  0),  the  simulator  computes  the  simulated  decryption  shares 
{p.j}pjgT  =  SHE.ShareSim((c,  0),  trifc,  {fli} p-^t)-  The  simulator  then  sends  the  simulated  shares  {/rj  to  A  as 
the  decryption  shares  on  the  behalf  of  the  honest  parties. 

-  Output  Stage: 

•  Randomization:  On  receiving  (sid,  i,  {d,  7))  for  every  Pi  G  T  from  A,  the  simulator  computes  encryptions  of  x(0)  for 
every  honest  party,  except  for  one  honest  party,  say  Ph,  it  encrypts  xiv)-  The  simulator  sends  these  ciphertexts  to  A  on 
the  behalf  of  the  honest  parties  and  then  follows  the  protocol  steps  to  obtain  (c,  0)  corresponding  to  the  output  wire.  Note 
that  the  plaintext  associated  with  c  is  x(3/)>  since  Ci  is  replaced  by  Cp  in  the  simulation  and  one  of  the  ciphertexts  on  the 
behalf  of  an  honest  party  (for  randomization)  encrypts  xiv)- 

•  On  receiving  the  decryption  share  (sid,  i,  Xi)  for  every  Pi  G  T  from  A,  the  simulator  computes  the  simulated  decryp¬ 
tion  shares  {7*  }p^.gp\p  =  SHE. ShareSim((c,  0),  x(2/),  {7i}PieT)  for  the  the  honest  parties  Pj  G  P  \T  and  sends 
(sid, 7, 7j  )  as  the  decryption  shares  to  A. 

The  simulator  then  outputs  Tl’s  output. 


Fig.  4.  Simulator  for  the  semi-honest  adversary  A  corrupting  t  parties  in  the  set  T  cP. 
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c 

We  will  now  prove  that  IDEAL^p^  5™  ^  ~  REAL77SH  ^  2  via  a  series  of  hybrids.  The  output  of  each  hybrid  is 

always  just  the  output  of  the  environment  Z.  Starting  with  HYBq  =  REAL/j-sh  ^  2^  we  gradually  make  changes  to 
define  HYBi,HYB2,HYB3  and  HYB4.  ^ 

HYBi:  Same  as  HYBq,  except  that  the  decryption  shares  of  the  honest  parties  corresponding  to  the  ciphertext  c  asso¬ 
ciated  with  the  output  wire  (obtained  after  the  randomization)  are  computed  using  SHE.ShareSim,  by  inputting  to 
it  the  decryption  shares  of  the  corrupted  parties  corresponding  to  c,  the  ciphertext  c  and  the  plaintext  x(j/)>  where 
y  is  the  function  output. 

HYB2:  Same  as  HYBi,  except  that  Ci  obtained  from  JEsetupCen  is  replaced  by  Cq  and  the  circuit  is  computed  as 
in  protocol  with  Cq  being  used  in  place  of  Ci.  Moreover,  during  the  randomization  step  while  performing  the 
distributed  decryption  of  the  output  wire  ciphertext,  the  randomizing  ciphertext  {ci,L)  of  one  of  the  honest  parties 
(which  is  an  encryption  of  0),  say  P^,  is  replaced  by  a  random  encryption  of  x{y)  ■ 

HYB3:  Same  as  HYB2,  except  that  SHE.ShareSim  is  used  while  computing  the  decryption  shares  of  the  honest 
parties  for  performing  the  distributed  decryption  during  the  evaluation  of  the  refresh  gates. 

HYB4:  Same  as  HYB3,  except  that  the  real  inputs  of  the  honest  parties  are  replaced  by  x(0)  during  the  Input  Stage 
and  the  circuit  is  evaluated  using  encryptions  of  the  x(0)s  the  encrypted  inputs  of  the  honest  parties. 

Our  proof  will  conclude,  as  we  show  that  every  two  consecutive  hybrids  are  computationally  indistinguishable  and 
HYB4  =  IDEAL^^,5s„_2. 

c 

HYBq  ~  HYBi:  This  follows  from  the  share  simulation  indistinguishability  property  of  SHE. 

C 

HYBi  ~  HYB2:  To  show  the  indistinguishability,  we  rely  on  the  semantic  security  of  SHE.  In  fact,  we  use  a  vari¬ 
ant  of  the  semantic  security  notion,  where  the  adversary  gives  two  pairs  of  messages  to  the  challenger  and  the 
challenger  picks  a  random  pair  and  gives  the  encryptions  for  that  pair  to  the  adversary.  We  call  this  as  the  double 
message  semantic  security.  It  follows  by  a  standard  hybrid  argument  that  a  scheme  offering  semantic  security  also 
offers  double  message  semantic  security  with  a  security  loss  of  a  factor  of  two. 

We  now  show  how  a  distinguisher  Z  for  the  hybrids  HYBi  and  HYB2  can  be  used  to  break  the  double  message 
semantic  security  of  the  underlying  SHE.  Let  TZ  be  the  attacker  that  wants  to  break  the  double  message  semantic 
security  of  the  underlying  SHE;  TZ  uses  Z  to  do  so  as  follows:  TZ  receives  the  public  key  pk,  evaluation  key  ek  and  t 
keys  corresponding  to  the  corrupted  parties  for  performing  the  distributed  decryption.  The  attacker  TZ  then  invokes 
Z  (in  her  head),  which  gives  back  the  input  set  (a;i , . . . ,  x„)  G  Fjj  for  all  the  parties.  Using  this  output  TZ  computes 
the  function  output  y  and  prepares  two  pairs  of  messages  for  the  challenger,  (1,  0)  and  (0,  xiu))  hands  them 
over  to  the  challenger.  Let  TZ  receive  back  the  encrypted  pair  (c',  L),  (c,  L)  from  the  challenger.  The  algorithm  TZ 
now  applies  SHE.LowerLevel  to  reduce  the  first  of  these  to  level  one,  (by  abuse  of  notation  we  shall  still  refer  to 
it  as  c').  Now  TZ  evaluates  the  circuit  by  generating  offline  data  honestly  and  using  (c',  1)  in  place  of  (ci,  1)  (that 
was  to  be  returned  by  ^FsetupCen)  and  (c,  L)  in  place  of  the  randomization  ciphertext  (namely  an  encryption  of  0) 
on  the  behalf  of  the  honest  party  Ph  (which  Ph  would  have  given  to  randomize  the  output  wire  ciphertext).  Einally 
TZ  outputs  what  Z  outputs. 

It  is  easy  to  note  that  if  the  challenger  had  given  encryptions  of  the  first  pair  of  messages,  namely  (1, 0),  then  Z  is 
in  HYBi,  else  it  is  in  HYB2.  Thus  the  distinguishing  probability  of  Z  is  translated  to  the  winning  probability 
of  TZ  in  the  double  message  semantic  security  game.  This  implies  that  our  claim  is  true  and  there  exists  no  PPT 
distinguisher  Z  for  the  above  two  hybrids. 

C 

HYB2  «  HYB3:  This  can  be  shown  by  relying  on  the  share  simulation  indistinguishability  property  of  SHE  and 
by  defining  G/j  hybrids  over  the  number  of  refresh  gates,  where  the  ith  hybrid  is  same  as  HYB2,  except  that 
SHE.ShareSim  is  invoked  for  the  first  i  refresh  gates  (assuming  topological  ordering  of  the  gates)  to  compute  the 
decryption  shares  of  the  honest  parties  and  for  the  {i  -f  l)th  refresh  gate  onwards,  the  decryption  shares  of  the 
honest  parties  are  computed  as  in  real  protocol  using  SHE.ShareDec. 

HYB3  «  HYB4:  We  resort  to  the  semantic  security  of  the  underlying  SHE  scheme.  We  let  H  =  \'P  \  T\  denote  the 
number  of  honest  parties  and  without  loss  of  generality  assume  that  the  first  H  parties  are  the  honest  parties.  We 
introduce  H  +  1  hybrids  HYB!^  =  HYB3,  HYBI,  HYBf  =  HYB4  over  the  number  of  honest  parties  so 
that  the  ith  hybrid  HYBg  is  same  as  the  {i  —  l)th  hybrid  HYBg”^,  except  that  the  input  of  the  ith  honest  party 
is  replaced  by  x(0)-  We  now  show  that  HYBg”^  «  HYBg  for  i  G  [1, . . .  ,H]  which  will  let  us  conclude  that 
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HYBa  ~  HYB4.  We  fix  an  i  and  show  that  any  Zi  that  tells  apart  HYBg”^  and  HYBg  can  be  turned  into  an 
attacker  that  can  break  semantic  security  of  the  SHE  scheme. 

Let  TZ  be  the  attacker  that  wants  to  break  the  semantic  security  of  the  SHE.  The  attacker  participates  in  the  se¬ 
mantic  security  game  and  receives  from  the  challenger  pk,  ek  and  t  keys  corresponding  to  the  corrupted  parties 
for  performing  the  distributed  decryption.  It  then  invokes  Zi  (in  head)  to  receive  the  inputs  for  the  parties,  say 
(a;i, . . . ,  Xn)  and  computes  the  function  output  y.  The  attacker  prepares  two  messages,  x(0)  and  xi^i)  for  the 
challenger,  the  latter  being  received  from  Zi  as  the  input  of  Pi  (namely  Xi).  In  return,  the  attacker  gets  back 
(Cxi,  L)  which  either  encrypts  x(0)  or  xi^i)-  Now  the  attacker  computes  encryptions  of  x(0)  for  the  hrst  (z  —  1) 
parties,  for  Pi  the  attacker  uses  Cx^  received  from  the  challenger  and  for  the  remaining  parties,  the  attacker  com¬ 
putes  encryptions  of  x{xi+i), . . . ,  xi^n)-  The  attacker  TZ  then  honestly  evaluates  the  circuit  on  these  encrypted 
inputs,  ensuring  all  the  similarities  between  HYBg”^  and  HYBg.  Namely,  the  the  attacker  performs  the  offline 
computation  honestly  and  uses  (co,  1)  (an  encryption  of  0)  instead  of  (ci,  1)  (as  received  from  the  .FsetupGen)- 
Moreover,  while  performing  the  randomization  during  the  distributed  decryption  of  the  output  wire  ciphertext,  the 
attacker  uses  an  encryption  of  x{y)  as  the  randomizing  ciphertext  on  the  behalf  of  the  honest  party  Ph  (instead  of 
an  encryption  of  0),  so  as  to  make  the  output  wire  ciphertext  an  encryption  of  x(y) .  Eurthermore,  the  attacker  uses 
SHE.ShareSim  to  compute  the  decryption  shares  for  the  honest  parties  while  performing  the  distributed  decryp¬ 
tion  for  the  refresh  gates  and  for  the  output  wire.  Note  that  the  attacker  will  know  the  plaintext  associated  with  the 
ciphertext  to  be  decrypted  (both  for  the  refresh  gates  as  well  as  for  the  output  wire)  while  using  SHE.ShareSim, 
even  without  knowing  the  actual  circuit  input  of  the  party  Pi  (namely  the  plaintext  associated  with  the  challenge 
ciphertext  (cx^,  L))  used  for  the  circuit  evaluation.  This  is  because  now  Co  (instead  of  ci)  is  multiplied  with  the 
ciphertexts  that  are  to  be  decrypted  in  the  protocol  and  so  the  post-multiplication  ciphertexts  have  associated  plain¬ 
text  0,  irrespective  of  the  actual  circuit  inputs.  This  allows  TZ  to  invoke  SHE.ShareSim  on  a  ciphertext  for  which  it 
knows  the  associated  plaintext  even  without  knowing  the  inputs  to  the  circuit.  More  specihcally,  for  every  refresh 
gate,  TZ  now  knows  the  plaintext  associated  with  the  ciphertext  to  be  decrypted,  since  it  solely  depends  on  the  data 
created  in  offline  computation  which  will  be  known  to  TZ.  On  the  other  hand,  for  the  output  wire,  TZ  knows  the 
plaintext  associated  with  the  ciphertext  to  be  decrypted,  since  it  is  nothing  but  the  circuit  output  xiu)-  Finally  at 
the  end  of  the  circuit  evaluation  as  above,  TZ  outputs  what  Zi  outputs. 

Now  note  that  if  the  challenge  ciphertext  (cx^,  L)  is  an  encryption  of  xi'^i)^  then  Zi  is  in  HYBg”^,  else  it  is  in 
HYBg.  The  above  reduction  thus  shows  that  TZ  can  distinguish  between  encryptions  of  xi.'^i)  ™tl  x(0)  with  the 
same  probability  with  which  Zi  can  distinguish  between  HYBg^^  and  HYBg.  This  implies  that  our  claim  is  true. 

S 

HYB4  IDEALjr_^  5SH  2:  Follows  from  the  inspection  that  the  following  steps  have  been  performed  in  HYB4  as 
well  IDEAL^p,  z'-  (1)  Ci  is  replaced  by  Co,  (2)  the  inputs  of  the  honest  parties  are  replaced  by  x(0)s,  (3) 
SHE.ShareSim  is  invoked  to  compute  the  decryption  shares  of  the  honest  parties  corresponding  to  all  the  refresh 
gates  as  well  as  in  the  output  computation  stage  and  (4)  One  of  the  honest  party’s  randomizing  ciphertext  is  an 
encryption  of  x(y)  instead  of  an  encryption  of  0. 

Thus  we  have  proved  the  following  claim  that  in  turn  concludes  the  theorem. 

C 

Cldiifi.  IDEALj^r^  ^sH  2:  ~  REALyjsH  2:- 

□ 


5  MFC  from  SHE  -  The  Active  Setting 

The  functionalities  from  Section  4  are  in  the  passive  corruption  model.  In  the  presence  of  an  active  adversary,  the 
functionalities  will  be  modihed  as  follows:  the  respective  functionality  considers  the  input  received  from  the  majority 
of  the  parties  and  performs  the  task  it  is  supposed  to  do  on  those  inputs.  For  example,  in  the  case  of  tFj,  the  func¬ 
tionality  considers  for  the  computation  those  XiS,  corresponding  to  the  P^s  from  which  the  functionality  has  received 
the  message  (sid,  i,  Xi)',  on  the  behalf  of  the  remaining  P^s,  the  functionality  substitutes  0  as  the  default  input  for  the 
computation.  Similarly  for  PsetupGen,  the  functionality  performs  its  task  if  it  receives  the  message  (sid,z)  from  the 
majority  of  the  parties.  These  are  the  standard  notions  of  dehning  ideal  functionalities  for  various  corruption  scenar¬ 
ios  and  we  refer  [32]  for  the  complete  formal  details;  we  will  not  present  separately  the  ideal  functionality  Pf  and 
PsetupGen  for  the  malicious  setting. 
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A  closer  look  at  shows  that  we  can  “compile”  it  into  an  actively  secure  MPC  protocol  tolerating  t  active 
corruptions  if  we  ensure  that  every  corrupted  party  “proves”  in  a  zero  knowledge  (ZK)  fashion  that  it  constructed  the 
following  correctly:  (1)  The  ciphertexts  in  the  offline  phase;  (2)  The  ciphertexts  during  the  input  stage  and  (3)  The 
randomizing  ciphertexts  during  the  output  stage. 

Apart  from  the  above  three  requirements,  we  also  require  a  “robust”  version  of  the  SHE.ShareCombine  method 
which  works  correctly  even  if  up  to  t  input  decryption  shares  are  incorrect.  In  Section  6  we  show  that  for  our  specific 
SHE  scheme,  the  SHE.ShareCombine  algorithm  (based  on  the  standard  error-correction)  is  indeed  robust,  provided 
t  <  n/3.  For  the  case  of  f  <  nl2  we  also  show  that  by  including  additional  steps  and  zero-knowledge  proofs 
(namely  proof  of  correct  decryption),  one  can  also  obtain  a  robust  output.  Interestingly  the  MPC  protocol  requires 
the  transmission  of  at  most  O(n^)  such  additional  zero-knowledge  proofs;  i.e.  the  communication  needed  to  obtain 
robustness  is  independent  of  the  circuit.  We  stress  that  t  <  n/2  is  the  optimal  resilience  for  computationally  secure 
MPC  against  active  corruptions  (with  robustness  and  fairness)  [15,33].  To  keep  the  protocol  presentation  and  its 
proof  simple,  we  assume  a  robust  SHE.ShareCombine  (i.e.  for  the  case  of  f  <  n/3),  which  applies  error  correction 
for  the  correct  decryption.  In  the  same  section,  we  further  present  a  more  efficient  offline  phase  attaining  a  linear 
communication  overhead  (asymptotically)  in  the  number  of  preprocessed  ciphertexts. 


Functionality 

interacts  with  a  prover  Pi  €  {Pi, . . . ,  P„}  and  the  set  of  n  verifiers  P  =  {Pi, . . . ,  Pr,}  and  the  adversary  S. 

-  Upon  receiving  (sid,  i,  {x,  w))  from  the  prover  Pi  £  {Pi, . .  . ,  P„},  the  functionality  sends  (sid,  i,  x)  to  all  the  verifiers  in 
V  and  S  if  R{x,  w)  is  true.  Else  it  sends  (sid,  i,  _L)  and  halts. 

Fig.  5.  The  Ideal  Functionality  for  ZK 


The  actively  secure  MPC  protocol  is  given  in  Figure  5,  it  uses  an  ideal  ZK  functionality  parametrized  with 
an  NP-relation  R.  We  apply  this  ZK  functionality  to  the  following  relations  to  obtain  the  functionalities  and 

UC-secure  realizations  of  and  can  be  obtained  in  the  CRS  model,  similar 

techniques  to  these  are  used  in  [2].  Finally  we  do  not  worry  about  the  instantiation  of  ^PsetupCen  as  we  consider  it  a 
one  time  set-up,  which  can  be  done  via  standard  techniques  (such  as  running  an  MPC  protocol). 

-  i?enc  =  {((c,  0,  (x,  r))  I  (c,  I)  =  SHE.EnCpk(x,  r)  if  I  =  L  V  (c,  I)  =  SHE.LowerLevelek(SHE.EnCpk(x, 
r),  1)  if  1  =  1}:  we  require  this  relation  to  hold  for  the  offline  stage  ciphertexts  (where  I  =  L)  and  for  the  input 
stage  ciphertexts  (where  1  =  1). 

-  Rzeroenc  =  {((C)  ^))  (x,r))  |  (c,  L)  =  SH E. EnCpk(x,  r)  A  X  =  0}:  we  require  this  relation  to  hold  for  the 

randomizing  ciphertexts  during  the  output  stage. 

We  are  now  ready  to  present  the  protocol  77“^^  (see  Figure  6)  in  the  (^PsetupCen >  {-hybrid  model 

and  assuming  a  robust  SHE.ShareCombine  based  on  error-correction  (i.e.  for  the  case  t  <  n/3). 

Theorem  2.  Let  f  :  ¥p  —>■  ¥p  be  a  function  represented  by  a  well-formed  arithmetic  circuit  C  over  Fp.  Let  Tf 
(presented  in  Figure  1}  be  the  ideal  functionality  computing  f  and  let  SHE  be  a  threshold  L-levelled  SHE  scheme 
such  that  SHE.ShareCombine  is  robust.  Then  the  protocol  HJ^^  UC-secure  realises  Tj  in  the  (iPsETupGEN).P^"°) 
iF^^’‘''°’‘"'^)-hybrid  Model  against  a  static,  active  adversary  A  corrupting  t  parties. 

Proof.  Since  the  robust  SHE.ShareCombine  works  correctly  even  in  the  presence  of  t  active  corruptions,  the  correct¬ 
ness  of  our  MPC  protocol  follows  from  the  properties  of  iP|k"'  and  jjjg  same  arguments  as  used 

in  Theorem  1 .  More  specifically,  the  properties  of  ensures  that  during  the  offline  computation,  each  corrupted 

Pi  knows  the  plaintext  associated  with  the  ciphertext  .  Due  to  the  same  reason,  each  corrupted  Pi  knows  the 
plaintext  (namely  the  input)  xi^i)  associated  with  the  ciphertext  Cx^ .  Moreover,  the  property  of  ensures  that 
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Protocol  nf^'- 

Let  C  be  the  well  formed  arithmetic  circuit  over  Fp  representing  the  function  /,  let  denote  an  augmented  circuit  associated 

with  C,  and  let  SHE  be  a  threshold  L-levelled  SHE  scheme.  For  session  ID  sid  the  parties  in  "P  =  {Pi, . . . ,  P„}  do  the  following: 

Offline  Computation:  Every  party  P;  €  P  does  the  following: 

-  Call  PsetupGen  with  (sid,  i)  and  receive  (sid,  pk,  ek,  dki,  (c^,  1)). 

-  Same  as  in  the  offline  phase  of  nf',  except  that  for  every  message  niik  for  fc  £  [1, . . . ,  C]  and  the  corresponding  ciphertext 

(cmji.,f-/)  —  SH E. EnCpi^ (iTiii; ,  ) ,  call with  (sid,  t,  ((cmi^ , ^  (mi/g , fife ))) .  Receive  (sid , ji,  (  Crrij-^  ;  L))fromP'4'"" 

for  fc  £  [1, . . . ,  ^]  corresponding  to  each  Pj  £  V.  If  (sid,  j,  _L)  is  received  from  for  some  Pj  £  P,  then  consider  ^ 

publicly  known  level  L  encryptions  of  random  values  from  M  as  ,  L)  for  fc  £  [1, . . . , 

Online  Computation:  Every  party  Pi  gP  does  the  following: 

-  Input  Stage:  On  having  input  Xi  £  Fp,  compute  level  L  ciphertext  (cxi,  1)  =  SHE.LowerLevelek(SHE.EnCpk(x(a:i)i  d),  1) 

with  randomness  and  call  with  the  message  (sid,  i,  ((Cx^ ,  1),  (x(®i)i  p)))-  Receive  (sid,  j,  (cx^ ,  1))  from 

corresponding  to  each  Pj  £  V.  If  (sid,  ji,  _L)  is  received  from  some  Pj  £  P,  then  consider  a  publicly  known  level 

1  encryption  of  x(0)  ^s  (cx^ ,  1)  for  such  a  Pj. 

-  Computation  Stage:  Same  as  except  that  now  the  robust  SHE.ShareCombine  is  used. 

-  Output  Stage:  Let  (c,  1)  be  the  ciphertext  associated  with  the  output  wire  of  where  [  £  [1, . . . ,  L]. 

•  Randomization:  Compute  a  random  encryption  {ci,L)  =  SHE.EnCpk(0,  r()  of  0  =  (0, . . . ,  0)  and  call 

with  the  message  (sid,  i,  {{ci,L),  (0,  r'))).  Receive  (sid,  ji,  {cj,  L))  from  J^^^c’-onn.a  corresponding  to  each  Pj  £  P.  If 
(sid,j,  _L)  is  received  from  ^Qjpig  ^  jjjgjj  consider  a  publicly  known  level  L  encryption  of  0  as 

{cj,L)  for  such  a  Pj . 

•  The  rest  of  the  steps  are  same  as  in  nj",  except  that  now  the  robust  SHE.ShareCombine  is  used. 


Fig.  6.  The  Protocol  for  Realizing  Pf  against  an  Active  Adversary  in  the  (PsetupGen,  ,  P^"'“"*‘')-hyhrid  Model 
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each  corrupted  Pi  has  indeed  contributed  an  encryption  of  0  as  a  randomizing  ciphertext  during  the  distributed  decryp¬ 
tion  of  the  output  wire  ciphertext.  The  homomorphic  property  of  the  SHE  ensures  that  the  addition  and  multiplication 
gates  are  evaluated  correctly.  We  next  argue  that  even  the  refresh  gates  are  evaluated  correctly.  This  follows  because 
once  the  parties  have  access  to  the  offline  data,  each  refresh  gate  can  be  evaluated  correctly  if  the  parties  are  able  to 
decrypt  the  corresponding  masked  ciphertext  Cz+m-  However  since  SHE.ShareCombine  works  even  in  the  presence 
of  t  active  corruptions,  it  follows  that  the  parties  can  decrypt  Cz+m-  Due  to  the  same  reason,  the  parties  will  be  able  to 
decrypt  the  ciphertext  associated  with  the  output  wire  and  hence  can  obtain  the  function  output. 

We  next  prove  the  security.  Let  ^  be  a  real-world  active  adversary  up  to  t  parties  and  let  T  C  V  denote  the  set 
of  corrupted  parties.  We  now  present  an  ideal-world  adversary  (simulator)  for  A  in  Figure  7;  for  simplicity,  we 
assume  that  an  SHE  with  a  robust,  non-interactive  SHE.ShareCombine  (i.e.  for  t  <  n/3)  has  been  used  in  the  MFC 
protocol.  The  indistinguishability  between  the  real  and  ideal  world  now  follows  mostly  by  the  similar  arguments  given 
for  semi-honest  case  (see  the  proof  of  Theorem  1). 

□ 
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Simulator 

Let  SHE  be  a  threshold  L-levelled  SHE  scheme.  The  simulator  plays  the  role  of  the  honest  parties  and  simulates  each  step  of  the 
protocol  J7“‘'  as  follows.  The  communication  of  the  Z  with  the  adversary  A  is  handled  as  follows:  Every  input  value  received  by 
the  simulator  from  Z  is  written  on  ^’s  input  tape.  Likewise,  every  output  value  written  by  A  on  its  output  tape  is  copied  to  the 
simulator’s  output  tape  (to  be  read  by  the  environment  Z).  The  simulator  then  does  the  following  for  session  ID  sid: 

Offline  Computation: 

-  On  receiving  the  message  (sid,  i)  to  JTsetupGen  from  A  for  each  Pi  £  T,  invoke  (pk,  ek,  sk,  dki, . . . ,  dk„)  =  SHE.KeyGen( 
1'‘,  n),  compute  (cg,  1)  =  SHE.LowerLevelek(SHE.EnCpk(0,  •),  1),  and  send  (sid,  pk,  ek,  {dkijp.gT,  (cg,  1))  to  A. 

-  Eor  each  party  Pj  ^  T  and  fc  £  [1, . . . ,  ^],  compute  (cm_,fc,  L)  =  SHE.EnCpk(mjfc,  •)  for  a  randomly  chosen  nijk  £  A4  and 

send  (sid,  j,  (cmjfc ,  L))  to  ^  on  the  hehalf  of  ■  For  each  Pi  G  T  on  receiving  (sid,  i,  (cm;;, ,  L),  {m.ik,rik))  as  a  message 

to  Pzk""  from  A  for  fc  £  [1, . . . ,  ^],  verify  if  (cm;;, ,  L)  =  SHE.EnCpk(mifc,  rik).  If  the  verification  fails  for  some  Pi  G  T 
then  send  (sid,  i,  _L)  ((  times  (corresponding  to  (  ciphertexts)  to  A  and  set  publicly  known  level  L  encryptions  of  random 
values  from  A4  as  {cm-f, ,  L)  for  fc  £  Compute  the  fcth  ciphertext  and  the  fcth  plaintext  of  the  offline  phase  as  in 

jjMAL  -pjjg  jjg  computed  by  the  simulator  since  it  knows  all  the  plaintexts. 

Online  Compntatlon: 

-  Input  Stage: 

•  For  every  party  Pj  G  P  \  T,  compute  a  random  encryption  (cx, ,  1)  =  SHE.LowerLevelek(SHE.EnCpk(x(0),  •))  1) 

send  (sid,j,  (cx^,  1))  to  A  on  the  hehalf  of  ■  P°r  each  Pi  G  T  on  receiving  (sid,i,  (cx^,  1),  (x(®*))D))  as  a 

message  to  P^"""  from  A,  verify  (cx^,  1)  =  SHE.LowerLevelek(SHE.EnCpk(x(®i)i  d))  and  send  (sid,i,_L)  to  A  if 
verification  fails.  Use  publicly  known  ciphertext  (cx; ,  1)  encrypting  Xi  =  x(0)  on  the  hehalf  of  any  such  Pi. 

•  Send  (sid,  i,  Xi)  to  Pf  on  the  hehalf  of  each  Pi  G  T  and  receive  the  function  output  y. 

-  Computation  Stage:  The  simulator  acts  in  the  same  way  as  in  Sf'  except  that  whenever  A  sends  the  decryption  shares 
corresponding  to  the  parties  in  T  during  the  evaluation  of  the  refresh  gates,  the  simulator  ignores  them;  instead  it  computes  the 
decryption  shares  hy  itself  using  the  keys  dki  (for  the  distributed  decryption)  corresponding  to  Pi  G  T  (the  simulator  knows 
dki  for  every  Pi  G  T  since  it  generated  them  by  itself).  These  new  decryption  shares  are  then  fed  to  SHE.ShareSim  to  obtain 
the  simulated  decryption  shares  corresponding  to  the  honest  parties,  which  the  simulator  then  sends  to  A  on  behalf  of  the 
honest  parties. 

-  Output  Stage: 

•  Randomization:  Let  H  =  "P  \  T  be  the  set  of  honest  parties  and  let  Ph  be  some  party  in  H.  For  every  Pj  G  H  \  {Ph} 

compute  a  random  encryption  {cj,L)  =  SHE.EnCpk(0,  •),  while  for  P^  G  H  compute  a  random  encryption  {ch,  L)  = 
SHE.EnCpk(x(j/)5  •)■  For  every  Pj  G  H,  send  (sid,  j,  {cj,  L))  to  ^  on  the  behalf  of  . 

•  For  each  Pi  G  T  on  receiving  (sid,  i,  (ci,  L),  (0,  rO)  as  a  message  to  fj-Qjjj  verify  if  {ti,L)  = 

SHE.EnCpk(0,  ri).  If  the  verification  fails  for  some  Pi  G  T  then  send  (sid,  i,  _L)  to  A  and  consider  a  publicly  known 
level  L  encryption  of  0  as  {ci,L)  for  such  a  Pi. 

•  On  receiving  the  decryption  shares  from  A  corresponding  to  the  parties  Pi  G  T,  the  simulator  ignores  them  and  instead 
recomputes  them  using  the  dk^s  and  feed  them  to  SHE.ShareSim  to  compute  the  simulated  decryption  shares  for  the 
honest  parties.  Finally  it  sends  the  simulated  shares  to  A  on  behalf  of  the  honest  parties. 

The  simulator  then  outputs  ^’s  output. 


Fig.  7.  Simulator  for  the  active  adversary  A  corrupting  t  parties  in  the  set  T  C  P. 
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6  Obtaining  a  Robust  Protocol 


In  this  section  we  discuss  how  to  achieve  a  robust  SHE.ShareCombine  for  our  precise  SHE  scheme,  then  we  present  a 
modified  offline  phase  with  linear  communication  overhead. 

Recall  that  in  our  concrete  SHE  scheme,  the  SHE.ShareCombine  algorithm  takes  as  input  a  set  of  shares  ob¬ 
tained  via  Shamir  Secret  sharing  over  the  ring  Rq^ .  Erom  this  observation  it  is  clear,  by  the  standard  error  correction 
properties  of  the  Reed-Solomon  codes  (upon  which  the  Shamir  secret  sharing  is  based),  that  one  can  obtain  a  robust 
SHE.ShareCombine  algorithm  immediately  in  the  case  of  f  <  n/3. 

All  that  remains  is  to  present  a  robust  SHE.ShareCombine  for  the  case  t  <  n/2.  We  present  the  protocol  (note 
that  SHE.ShareCombine  will  be  now  a  protocol  instead  of  a  local  algorithm  as  it  may  involve  interaction  among  the 
parties)  in  Eigure  8  that  uses  the  dispute-control  framework  proposed  in  [3]  and  the  fact  that  Reed-Solomon  codes  can 
detect  up  to  f  <  n/2  errors.  The  protocol  also  invokes  the  ZK  functionality  for  the  relation  Rsharedec  a  limited  number 
of  times  for  the  proof  of  correct  (distributed)  decryption,  where  Rsharedec  is  given  below. 

Rsharedec  —  {(((c,  1),/Xi),dki)  I  /r*  =  SHE.ShareDeCdki(c,  [)} 

Unlike  the  functionality  defined  in  Eigure  5  that  treats  all  the  parties  in  V  as  the  verifiers,  it  is  enough  if  the 
functionality  for  Rsharedec  is  defined  in  a  single  prover  and  a  single  verifier  setting.  However  we  avoid  elaborating 
more  on  this  to  keep  simplicity. 

Our  robust  SHE.ShareCombine  realises  the  following  idea:  Eor  distributed  decryption,  as  usual,  every  party  sends 
the  decryption  shares  to  every  other  party.  A  party  Pi  on  receiving  the  decryption  shares  first  check  whether  all  of 
them  lie  on  a  unique  polynomial  of  degree  at  most  t  (namely  error  detection).  If  no  error  is  detected  then  the  secret  can 
be  safely  reconstructed.  However  if  some  error  is  detected  then  Pi  “complains”  to  the  parties,  asking  them  to  prove 
the  correctness  of  their  respective  decryption  shares  sent  earlier;  the  parties  respond  back  with  ZK  proofs  by  calling 
the  functionality.  Now  Pi  can  “identify”  the  incorrect  decryption  share  providers  and  ignore  their  shares 

in  the  future  instances  of  distributed  decryption.  Each  party  Pi  keeps  a  list  Tii  of  the  parties  who  it  believes  to  be 
honest  so  far.  Proper  care  has  to  be  taken  to  ensure  that  the  honest  parties  do  not  respond  back  “too  many  times”  to 
the  “false”  complaints  issued  by  the  corrupted  parties.  This  is  resolved  via  keeping  counters  for  the  complaints.  The 
idea  is  that  an  honest  Pj  will  complain  to  an  honest  Pi  at  most  t  times  and  thus  all  the  complaints  from  Pj  after  fth 
complaint  clearly  indicates  that  the  complaint  is  false  and  Pj  is  corrupted.  It  is  now  easy  to  see  that  by  using  this  trick, 
the  total  number  of  calls  to  MPC  protocol  will  be  0{n^),  which  is  independent  of  the  circuit  size; 

this  is  because  a  party  may  have  to  provide  ZK  proof  to  another  party  (by  calling  in  at  most  t  instances  of 

distributed  decryption.  Eor  large  circuit  sizes  the  extra  communication  cost  to  obtain  a  robust  SHE.ShareCombine  in 
the  case  n/3  <  f  <  n/2  can  be  safely  ignored. 
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SHE.ShareCombine 

Each  party  Pi  maintains  its  local  copy  Hi  of  a  list  all  the  parties  which  it  currently  assumed  to  be  honest.  Initially  each  Hi  = 
{Pi, . . . ,  P„}.  Apart  from  this,  every  party  Pi  maintains  n  counters  cnti_i, . . . ,  cnti^„,  where  cntij  is  used  to  maintain  a  count 
of  number  of  times  an  error  message  has  been  received  from  the  party  Pj ;  initially  all  these  counters  are  set  to  0.  To  execute  an 
SHE.ShareCombine((c,  1),  {/fi}j6{i...,n})  operation,  where  ftj  has  been  sent  by  Pj,  party  Pi  performs  the  following  steps: 

-  Ignore  all  fij  where  Pj  ^  Hi -If  the  remaining  jljslie  on  a  unique  polynomial  of  degree  at  most  t,  then  output  the  corresponding 
secret  (namely  the  constant  term  of  the  polynomial).  Otherwise,  send  a  message  (sid,  i,  Errori,  (c,  1))  to  every  party  Pj  £  Hi. 

-  If  an  error  message  (sid,  y,  Er  rorj ,  (c,  [))  has  been  received  from  some  Pj  £  Hi  then  check  whether  cr\U,j  <  t-  If  cnU,j  <  t, 

then  call  with  the  message  (sid,  i,  j,  (((c,  1),  /fi),  dki))  and  set  cntij-  :=  cntij  +  1.  Else  if  cntij  >  t  then  remove 

Pj  from  the  list  Hi. 

-  If  an  error  message  (sid,  i,  Errori,  (c,  1))  has  been  sent  in  the  first  step,  then  execute  the  following:  receive 

(sid,  j,  i,  ((c,  1),  ftj))  from  for  every  Pj  £  Hi.  If  for  some  Pj  £  Hi,  the  message  (sid,  j,  i,  _L)  is  received  from 

then  remove  Pj  from  Hi.  Using  the  p,jS  corresponding  to  the  Pj  £  Hi,  interpolate  the  polynomial  of  degree  at 
most  t,  output  its  constant  term  as  the  secret. _ 


Fig.  8.  Robust  SHE.ShareCombine  Eor  f  <  n/2 


6.1  An  Improved  Offline  Phase  (sketch) 


From  the  analysis  in  Section  9,  we  find  that  the  online  communication  complexity  of  our  protocol  is  Cost  =  0{n  ■ 
|Gm  I )  (in  the  asymptotic  sense).  We  now  sketch  that  how  we  can  modify  our  offline  computation  so  that  asymptotically 
the  communication  complexity  of  the  offline  phase  is  0{n  ■  ((),  where  (  >  Gji  is  the  number  of  random  ciphertexts 
generate  in  the  offline  phase.  We  need  the  following  three  tools: 


-  Multi-valued  Broadcast  with  0{n)  Overhead  [27]:  This  protocol  allows  a  sender  Sen  £  {Pi, . . . ,  P„}  to  send  a 
message  m  of  size  £  “identically”  to  all  the  n  parties  (even  if  Sen  is  corrupted).  The  protocol  can  tolerate  up  to 
t  <  n/2  faults  (even  if  the  adversary  is  computationally  unbounded)  and  has  communication  complexity  0{nl) 
provided  £  =  Q{n^). 

-  Randomness  Extraction  [33, 24]:  Given  a  set  of  n  encryptions  of  random  values  t  of  which  may  be  known  to  the 
adversary,  the  randomness  extraction  algorithm  based  on  superinvertible  matrix  [33]  or  Vandermonde  matrix  [24] 
allows  the  parties  to  (locally)  compute  encryptions  of  (n  —  t)  random  values  unknown  to  the  adversary. 

-  Non-interactive  Zero  Knowledge  Proofs:  We  require  UC-secure  instantiation  of  such  that  a  party  Pi  £ 

{Pi, . . . ,  P„}  on  computing  encryptions  of  £  random  values  can  publicly  prove  to  anyone  that  it  knows  the  associ¬ 
ated  plaintexts  by  “attaching”  a  proof  of  size  0{£).  Such  proofs  can  be  obtained,  for  example  using  the  techniques 
of  [2]. 

Now  the  offline  phase  protocol  will  proceed  as  follows:  every  party  Pi  computes  encryptions  of  C  random  elements 
along  with  a  NIZK  proof  that  it  knows  the  associated  plaintexts  where  L  =  (n-t)  ■  Party  Pi  then  broadcasts  the 
ciphertexts  along  with  the  proof  by  acting  as  a  Sen  and  invoking  the  instance  of  a  multi-valued  broadcast  protocol. 
The  ciphertexts  received  from  the  different  parties  are  then  perceived  as  C  batches  of  ciphertexts,  where  the  (th  batch 
consists  of  the  (th  ciphertext  broadcasted  by  each  party  for  (  £  [!,...,£].  Finally,  the  randomness  extraction  algorithm 
on  each  batch  of  ciphertext  to  obtain  (n  —  t)  random  ciphertexts  from  each  batch  and  in  total  C-{n  —  t)  =  ciphertexts. 
Assuming  C  =  Q{n^),  the  total  communication  cost  for  the  offline  phase  is  now  0{n  ■  (,):  each  instance  of  broadcast 
protocol  has  communication  complexity  0{n  ■  C)  =  0{C),  as  {n  —  t)  =  0(n).  It  is  easy  to  see  that  the  output 
ciphertexts  are  indeed  random  as  there  exists  at  least  (n  —  t)  honest  parties  corresponding  to  each  batch  of  ciphertexts. 
Note  that  we  do  not  require  any  powerful  (but  somewhat  complex)  tools  like  player  elimination,  as  used  in  the  MFC 
protocol  of  [33]  (whose  communication  complexity  is  also  0(n  ■  C)). 
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7  Instantiating  our  FHE  using  BGV 


In  this  section  we  show  an  instantiation  of  SHE  based  on  the  scheme  of  Brakerski,  Gentry  and  Vaikuntanathan  (BGV) 
([10]).  As  in  [6]  we  make  use  of  Shamir  secret  sharing  to  share  the  secret  key  among  the  parties  and  pseudorandom 
secret  sharing  (PRSS)  [17]  to  non-interactively  share  a  pseudorandom  value  from  a  chosen  interval.  We  describe  a 
variant  of  the  BGV-type  cryptosystems  based  on  the  ring  learning  with  error  (RLWE)  assumption  ([36]),  naturally 
supporting  the  packing  operations  described  in  Section  3. 

7.1  Preliminaries 

Plaintext  Space:  We  define  the  polynomial  ring  R  :=  Z[x]/{f{x)),  where  f{x)  is  a  monic  irreducible  polynomial.  For 
our  purposes  it  will  suffice  to  fix  f{x)  as  the  cyclotomic  polynomial  <Prn{x)  =  x^!"^  +  1  with  m  a  power  of  two.  We 
set  TV  =  =  m (2,  where  (p  is  the  Euler  totient  function.  The  ring  R  is  the  ring  of  integers  of  the  mth  cyclotomic 

number  field  Q(Cm),  with  an  mth  root  of  unity.  Denote  by  Rq  :=  R/qR,  for  an  integer  q  the  reduction  of  R  modulo 
q,  i.e.  the  set  of  all  integer  polynomials  of  degree  at  most  TV  —  1  with  coefficients  in  {—qj2,  q/2]. 

Looking  ahead  the  plaintext  space  of  the  scheme  will  be  defined  to  be  Rp  :=  R/pR  for  some  prime  p  such  that 
p  =  1  mod  TO.  Since  p  =  1  (mod  to),  the  polynomial  ^m{x)  splits  into  distinct  linear  factors  Fi{x)  modulo  p: 

M  :=  Rp  =  Zp[x]/Fi{x)  x  •  •  •  x  Zp[x]/Fn{x)  =  F^, 

where  each  factor  corresponds  to  an  independent  “plaintext  slot”,  holding  an  element  of  the  finite  field  Fp.  Thus 
each  message  m  G  A4  actually  corresponds  to  TV  messages  in  Fp  and  can  be  represented  as  an  TV-vector  (m 
mod  By  the  Chinese  Remainder  Theorem  addition  and  multiplication  in  Rp  correspond  to  SIMD  (Single 

Instruction  Multiple  Data)  operations  on  the  slots  and  this  allows  to  process  TV  input  values  at  once  as  described  in 
Section  3. 

If  we  consider  the  Galois  group  Gal  of  Q(Cm),  then  Gal  =  Gal(Q(Cm) /Q)  =  Z’p^  and  it  is  formed  by  the  mappings 
fJi  :  a(a;)  a(a;*)  mod  <Prnix)  for  all  i  G  Z^.  It  is  well  known  ([30])  that  Gal  transitively  acts  on  plaintext  slots,  i.e. 
Vi,j  G  {1, . . . ,  TV}  there  exists  an  element  G  Gal  which  sends  an  element  in  slot  i  to  an  element  in  slot  j. 

Random  Values:  During  our  construction  we  will  need  to  sample  elements  from  different  distributions  over  Rg.  We 
will  use  the  following  distributions  over  R,  and  then  map  to  Rq  as  appropriate. 

-  TiWF {h,N):  This  generates  a  vector  of  length  TV  with  elements  chosen  from  {  —  1,0,1}  such  that  the  number  of 
non-zero  elements  is  equal  to  h. 

-  20(0.5,  TV):  This  generates  a  vector  of  length  TV  with  elements  chosen  from  {  —  1,0,1}  such  that  the  coefficient 
probabilities  arep_i  =  1/4,  po  =  1/2  andpi  =  1/4. 

-  T>Q{a^,  TV):  This  generates  a  vector  of  length  TV  with  elements  chosen  according  to  the  discrete  Gaussian  distri¬ 
bution  D^n 

-  7^C(0.5,  a'^,N):  This  generates  a  triple  of  elements  (a,  b,  c)  where  a  is  sampled  from  20^(0. 5,  TV)  and  b  and  c 
are  sampled  from  VQ ,  TV). 

-  U{q^  TV):  This  generates  a  vector  of  length  TV  with  elements  generated  uniformly  modulo  q. 


Pseudorandom  Secret  Sharing  Over  Polynomial  Rings:  Pseudorandom  secret  sharing  was  first  introduced  in  [17].  Given 
a  setup,  a  PRSS  scheme  allows  parties  to  generate  almost  unlimited  number  of  Shamir  sharings  of  pseudorandom  val¬ 
ues  at  the  cost  of  no  communication.  Furthermore,  the  setup  is  generated  once  and  for  all  and  therefore  can  be  reused 
many  times.  While  known  PRSS  works  over  fields  or  rings  [17, 6],  for  our  purposes  we  will  require  a  PRSS  defined 
over  the  polynomial  rings  Rq^ . 

In  [17]  the  construction  of  a  PRSS  was  presented.  This  was  used  in  [6]  to  construct  a  PRSS  over  Zq,  where 
q  —  Y\Pi  for  n  parties,  such  that  each  pi  is  prime  with  pi  >  n.  This  construction  immediately  extends  to  Rq  by 
computing  the  underlying  PRF  TV  times.  For  completeness  we  overview  the  construction  here:  Given  an  element 
s  G  Rq,  we  use  [s]  for  the  Shamir’s  sharing  of  s,  [s]i  =  Si  for  the  Tth  component  of  the  sharing  of  s,  i  =  1, . . . ,  n.  We 

409 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


assume  a  prior  one-time  setup  which  distributes  a  vector  of  shared  keys  =  (ko,Aj  •  ■  • ,  to  each  party  in  A 

for  every  subset  A  of  size  n  —  t.  These  keys  will  be  used  as  the  keys  of  a  keyed  pseudorandom  function  PRF  family, 
{V’k(')}keK-  The  pseudorandomness  of  the  output  of  the  following  algorithm  can  be  reduced  to  the  PRF  security  of 
the  underlying  PRF  at  the  cost  of  security  loss  by  a  factor  of  1  /N. 

1.  The  parties  in  V  agree  on  N  elements  tj  G  Zg  for  j  G  {0, . . . ,  JV  —  1}. 

2.  For  j  =  every  party  Pi  G  P  computes  =  EAc-P:|A|=n-t.PieA  V'k..^  (fj)  • /a(*)- Where /^(X) 

denotes  the  polynomial  of  degree  at  most  t,  such  that  /a(0)  =  1  and  /a(0  =  0  for  every  Pi  ^  A. 

3.  For  j  =  0, . . . ,  —  1,  the  value  Sj  =  'YliA(zv-\A\=n-t  p  eA  ^kj  ^(fi)  denotes  the  jth  pseudorandom  shared  value 

from  Zq.  Define  the  associated  element  in  Rg  by  the  polynomial  ^  sjXK 

If  the  underlying  PRF  family  has  range  \—T, .  ■ .  ,T]  over  Zg  then  the  output  of  the  above  PRSS  is  an  element  in  Rg 
whose  coefficients  lie  in  the  range  [— (")T,  (”)T].Toease  notation  we  write  s  =  X]AcP|A|=ra-t  p  eA  (f )  for  the 
shared  value  in  Rg,  and  [s]j  =  X]AcP  |A|=n-i  p  eA  (f)  '  fA{i)  for  the  shares  themselves.  We  note  that  in  general 
(")  becomes  exponentially  large,  specially  if  f  is  a  constant  fraction  of  n;  however  in  most  practical  applications  of 
threshold  cryptography,  the  number  of  parties  n  is  indeed  expected  to  be  small. 

Canonical  Embedding  Norm:  Here  we  recall  some  results  on  cyclotomic  fields  that  we  need  to  estimate  the  parameters 
of  our  protocol  instantiations.  For  details  regarding  properties  of  canonical  norms  we  refer  to  [31,30,25].  Given  a 
polynomial  a  G  Rwe  denote  by  ||a||oo  =  maxo<i<Af-i  |ai|  the  standard  [oo-norm.  All  estimates  of  noise  are  taken 
with  respect  to  the  canonical  embedding  norm  HaH^"  =  ||CT(a)||oo,  where  a  is  the  canonical  embedding  R 
defined  hy  a  :  a  k  G  Z’^  and  Cm  a  fixed  primitive  mth  root  of  unity.  When  a  G  Rg,  for  some  modulus  q, 

we  need  the  canonical  embedding  norm  reduced  modulo  q: 

|a|“''  =  min{||a'|j“''  :  a'  G  R  and  a'  =  a  (mod  q)}. 

To  map  from  norms  in  the  canonical  embedding  to  norms  on  the  coefficients  of  the  polynomials  defining  the  elements 
in  R  we  note  that  we  have  ||a||oo  <  Cm  •  ||a||“"’  where  Cm  is  the  ring  constant.  Since  we  fix  the  choice  of  our  base 
field  polynomial  as  a  2^th  cyclotomic  polynomial,  we  have  Cm  =  1- 

7.2  The  Basic  L-levelled  Packed  BGV-type  Cryptosystem 

We  review  the  BGV  L-levelled  Packed  SHE  scheme.  The  scheme  is  parametrized  by  a  security  parameter  k,  for  a  fixed 
number  of  levels  L  -F  1.  Note,  we  use  L  -F  1  levels  in  our  scheme  description  to  make  the  presentation  consistent  with 
the  abstract  scheme  from  Section  3.  Eor  1  =  0, . . . ,  L,  fix  a  chain  of  moduli  qi  =  Y[\=oPi’  Pi  ^  prime  number. 
Encryption  generates  level  L  ciphertexts  with  respect  to  the  largest  modulus  qp  An  the  Ith  level  of  the  scheme  cipher- 
texts  consist  of  two  elements  in  i?q,,  1  =  0, . . . ,  L.  Throughout  homomorphic  evaluation  we  will  force  a  universal 
bound  B  on  the  noise  contained  in  ciphertexts  (when  measured  in  the  canonical  embedding  norm  reduced  modulo  q) 
after  a  SHE.LowerLevel  execution.  Since  ||a||oo  <  l|a|IS>"  5;  ^  '^his  provides  an  upper  bound  also  on  the  coefficients 
used  in  the  underlying  decryption  algorithm,  for  such  outputs  of  SHE.LowerLevel.  Eor  a  description  of  the  algorithm 
SHE.LowerLevel  see  [31];  where  it  is  called  modulus  switching. 

However,  when  applying  decryption,  or  distributed  decryption,  we  will  apply  the  procedure  to  a  ciphertext  which 
is  not  the  direct  output  of  a  SHE.LowerLevel  operation.  In  particular  we  assume  that  the  canonical  norm  of  the  noise 
of  an  element  passed  to  the  decryption  procedure  will  be  bounded  by  B^ec-  The  decryption  procedures  will  then  return 
the  correct  output  if  we  have  B^ec  <  <?o/2.  Eor  distributed  decryption  we  will  need  to  “boost”  this  bound  to  •  B^ec, 
where  exp  is  a  “closeness  parameter”  relating  to  the  statistical  security  parameter  sec.  Thus  distributed  decryption  will 
be  work  if  and  only  if  2^’*^  •  Ljec  <  9o/2-  Below  we  specify  the  basic  algorithms  for  the  BGV  scheme;  we  will  then 
discuss  the  extensions  to  cope  with  the  full  syntax  of  our  scheme  in  Definition  2. 

Before  presenting  the  methods  we  need  to  pause  briefly  to  remind  the  reader  about  modulus  switching:  A  ciphertext 
at  level  1  is  given  by  a  pair  c  =  (cq  ,  ci )  G  Rg^  and  the  decryption  procedure  computes,  for  the  global  secret  key  sk  G  R, 

[co  -  sk  •  cijq,  =  Co  -  sk  •  Cl  (mod  qi) 
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where  we  take  the  symmetric  modular  operation  in  the  range  [— (3'[/2, . . .  ,q{/2].  The  value  [cq  —  sk  •  ci]g,  can  be  inter¬ 
preted  as  an  element  in  R,  and  the  associated  noise  value  of  the  ciphertext  is  the  canonical  norm  of  this  element.  After 
each  homomorphic  operation  the  norm  of  the  noise  in  the  ciphertexts  increases.  To  reduce  it  the  modulus  switching 
technique  ([11, 10])  is  used.  This  procedure  takes  as  input  a  ciphertext  c  =  (cq,  Ci)  G  with  estimated  noise  v  and 
transforms  it  into  a  ciphertext  c'  G  ,  at  level  [',  with  noise  magnitude  v' ,  by  scaling  down  c  by  a  factor  qy  jqy  and 
then  rounding  to  get  back  an  integer  ciphertext.  The  ciphertext  c'  =  (cg,  c^)  satisfies  [cg  —  sk  •  cijg,  =  [cg  —  sk  • 
mod  p  and  v'  <  v.  This  modulus  switching  operation  corresponds  to  our  operation  SHE.LowerLevel  from  Definition 

2. 

1.  SHE.KeyGen(l”)  — >  (pk,  ek,  sk):  Outputs  a  secret  key  sk  ^  'HWT{h,  N),  a  common  public  key  pk  =  (a,  6) 

such  that  a  ^  Us{qL,  N)  and  6  =  o  •  sk  +  p  •  e,  with  e  ^  T>Q{a^,  N).  This  algorithm  also  outputs  the  evaluation 
key  ek  which  consists  of  +  1  public  “key-switching  matrices”  144^2^5;^  and  Vl4,;(sk)^sk  G  Gal  for 

i  =  1, . . .  ,N.  See  [31]  for  how  these  are  defined. 

2.  SHE.EnCpk(m)  ^  (c,  L):  Given  a  plaintext  m  G  Rp,  the  encryption  algorithm  samples  {v,  eg,  ei)  ^  TZCs{0-5, 
and  then  computes  in  Rq^ , 


cg  =  6-  f-|-p-eo  +  m  and  ci  =  a  •  v  +  p  •  ei. 

3.  SHE.DeCsk(c,  1)  — >  m':  Note,  this  algorithm  is  never  called  in  our  scheme,  we  just  present  it  here  so  as  to  define 
correctness  and  to  define  what  we  mean  by  a  message  associated  to  a  ciphertext.  The  algorithm  takes  as  input  a 
ciphertext  c  =  (cg,  ci)  G  R^^  and  outputs  a  plaintext  m'  G  Rp.  This  algorithm  uses  the  secret  key  sk  to  compute 

/i  =  Cg  —  sk  •  Cl  =  m'  +  p  •  (e  •  f  -f  eg  —  s  •  ei)  =  m!  +  p  ■  u 

in  and  then  obtains  m'  =  (p  mod  p).  We  denote  by  v  the  estimated  noise  magnitude  obtained  by  using 
the  canonical  embedding  norm  and  we  require  that  v  <  i?dec-  This  decryption  procedure  will  correctly  work  if 

.^dec  ^ 

4.  SHE.Evalek(C'^"'^,  (ci,  Ij”), . . . ,  (c£.^,  1“^))  (ci,  . . . ,  This  consists  of  three  separate  algo¬ 

rithm  SHE. Add,  SHE. Mult  and  SHE.ScalarMult  for  homomorphic  ally  evaluating  addition  and  multiplication 
gates. 

-  SHE.Addek((ci,  li),  (c2,  b)):  It  produces  a  ciphertext  CAdd  in  with  [  =  minjli,  I2}.  This  is  performed  by 
first  applying  c'  =  SHE.LowerLevelek((ci,  1^),  1)  and  then  taking  the  coordinate-wise  addition  of  c'l  and 
The  noise  magnitude  of  the  resulting  ciphertext  is  at  most  the  sum  of  the  noise  in  Ci  and  C2. 

-  SHE.Multek((ci,  li),  (c2, 12)):  This  produces  a  ciphertext  CMuit  in  Rq^,  with  1  =  minjli,  I2}  —  1.  This  is  done 
in  one  of  two  ways  (so  as  to  minimize  the  overall  parameter  sizes  in  our  scheme). 

•  If  [  7^  1  then  one  first  applies  c'  =  SHE.LowerLevelek((ci,  li),  1),  then  the  resulting  ciphertexts  are  ten- 
sored.  This  results  in  a  ciphertext  c  is  a  vector  of  higher  dimension  ([12])  and  corresponding  to  a  valid 
ciphertext  of  the  SIMD-product  of  the  associated  plaintexts  mi  •  m2  with  respect  to  a  secret  key  sk'  that 
is  the  tensor  product  of  the  secret  key  sk  with  itself.  The  Key  Switching  procedure  ([31])  is  then  applied, 
using  the  matrix  M4k2^sk>  to  obtain  a  valid  ciphertext  CMuit  G  Rg^  with  respect  to  the  original  secret  key 
sk.  The  noise  magnitude  in  cmuH  is  at  approximately  product  of  norms  of  the  noise  in  c'l  and  C2. 

•  If  1  =  1  then  one  applies  the  tensor  operation  to  Ci  and  C2  directly,  then  the  key  switching  is  performed 
and  only  then  is  a  SHE.LowerLevel  operation  performed.  This  results  in  us  needing  a  larger  prime  pi  than 
one  would  otherwise  need,  but  more  importantly  a  smaller  pg. 

-  SHE.ScalarMultek((c,  I), a):  If  c  =  (cg,ci)  then  one  can  obtain  a  homomorphic  scalar  multiplication  by 
evaluating  c'  =  (a-cg,  a-ci).  This  procedure  increases  the  noise,  but  not  by  as  much  as  a  normal  multiplication. 
Therefore  we  shall  ignore  the  noise  increase  produced  by  scalar  multiplication  in  our  analysis. 

Using  the  evaluation  key  we  can  also  define  an  addition  homomorphic  operation  as  in  [30,  31], 

-  SHE.Permuteek((c,  1),  cr)  — >  (cpermute,  1):  Given  a  G  Gal  and  a  ciphertext  c  =  (cg,  ci)  G  corresponding  to  a 
plaintext  m  G  Rp,  this  generates  a  ciphertext  Cpermute  =  (cqj  Ci)  G  Rg^  corresponding  to  a{m),  with  respect  to  the 
secret  key  (j(sk).  Key  switching  is  then  applied,  using  the  keyswitching  matrix  Wo.(sk)^sk  to  produce  a  ciphertext, 
cpermute  decryptable  under  sk. 


411 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


7.3  Defining  SHE. Pack  and  SHE. Unpack  for  BGV 


Despite  our  scheme  being  a  packed  SHE  scheme  it  can  still  evaluate  unpacked  ciphertexts;  indeed  many  of  the  in¬ 
stances  of  packed  SHE  schemes  were  originally  conceived  in  the  unpacked  case  by  taking  the  map  x  to  be  x(m)  = 
(to,  TO, . . .  ,to),  i.e.  the  diagonal  embedding.  Eor  example  this  is  the  case  with  the  schemes  in  [39, 12, 11,9]  etc  all 
of  which  have  packed  counterparts.  However,  such  a  choice  of  x  is  not  efficient  if  one  is  interested  in  packing  and 
unpacking  encryptions  of  elements  in  Fp.  We  wish  to  define  two  functions  SHE. Pack  and  SHE. Unpack;  the  first  of 
which  takes  N  ciphertexts  Ci  at  level  U  with  the  associated  plaintext  vector  for  rui  G  Fp,  and  produces  a  single 

ciphertext  c  at  level  min(li)  with  the  associated  plaintext  vector  m  =  (toi,  . . .  ,mN)  G  M.  The  second  function 
performs  the  reverse  operation. 

In  what  follows  we  let  denote  the  z-th  unit  vector  in  AI,  i.e.  the  element  which  is  zero  except  for  a  one  in  the  z-th 
position.  To  ease  notation  we  let  0  and  0  denote  the  operations  of  applying  the  SHE.  Add  and  SHE.Mult/SHE.ScalarMult 
operations  respectively,  we  also  let  ct(c)  denote  applying  the  SHE. Permute  operation  to  a  ciphertext  c  and  map  cr  G  Gal. 

If  we  define  x  by  the  diagonal  embedding  then  SHE. Pack  can  be  defined  in  the  following  way 

N 

SHE.Pack(ci, . . . ,  cat)  =  ^  0  c*, 

i.e.  SHE. Pack  is  an  0{N)  operation.  However,  SHE. Unpack  needs  to  be  performed  as  follows  for  i  =  1, . . . ,  A^, 

(N  N 

0  c), . . . ,  0  aN^ji^N  G)  c) 
i=i  i=i 

i.e.  SHE. Unpack  is  an  0{N‘^)  operation.  On  the  other  hand,  if  we  define  x  to  be  the  map  x(™)  =  0, . . . ,  0)  then 

we  can  define  SHE. Pack  and  SHE. Unpack  by  the  following  0{N)  operations; 

N 

SHE.Pack(ci, . . . ,  c^v)  =  SHE.Unpack(c)  =  (ei  0  c,  0-2^1(62  0c),...  ,aN^i{eN  ®  c)) . 

i=l 

Thus  we  will  utilize  the  mapping  xij^)  =  0, . . . ,  0)  in  our  proposal. 

7.4  Distributed  Decryption  Protocol 

All  that  remains  to  define  our  Threshold  L-levelled  Packed  SHE  system  based  on  BGV  is  to  present  the  distributed  de¬ 
cryption  protocol.  Note  that  we  do  not  use  the  key-homomorphic  properties  of  RLWE  schemes  as  previously  used  in  [1, 
35, 2].  Instead,  we  follow  the  approach  of  [6],  where  the  authors  construct  a  threshold  variant  of  Regev’s  cryptosystem 
([38]);  we  adapt  this  method  to  our  situation. 

At  a  high  level  the  method  works  as  follows:  we  modify  the  SHE.KeyGen  algorithm  so  that  it  also  outputs  for  each 
party  Pi  a  key  dk^  for  performing  distributed  decryption.  The  key  dk^  consists  of  two  components;  i.e.  dk^  =  (sk^,  k^). 
The  values  sk^  form  a  Shamir  sharing  over  the  ring  of  the  secret  key  sk,  with  threshold  t.  The  value  are  the 
associated  keys  for  the  PRSS  described  above.  Given  a  common  ciphertext  c  =  (cq  ,  ci )  G  Rq^  as  input  (for  decryption), 
the  parties  first  apply  SHE. LowerLevel  to  reduce  the  ciphertext  to  level  zero.  Then  each  party  Pi  computes  a  decryption 
share  fii  using  his  private  sk^  and  a  PRSS  over  Rq^  as  described  earlier.  The  underlying  PRF  we  assume  produces 
values  in  the  range 

r  (2-P-l)-Bdec  (2-P-l).Bdecl 

i  p-C)  ’  P-0  v 

where  i?dec  is  the  bound  on  the  canonical  norm  of  an  element  being  decrypted  mentioned  earlier,  and  hence  an  upper 
bound  on  the  size  of  the  coefficients  of  the  noise  polynomial  reconstructed  during  the  standard  decryption  procedure. 
See  Section  8  for  a  detailed  discussion  of  i?dec-  The  choice  of  this  range  of  the  underlying  PRF  family  means  that 
the  values  output  by  the  PRSS  will  be  shares  of  elements  in  Rq^  whose  coefficients  lie  in  the  range  [— (2®^p  —  1)  • 
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Bdec/p,  (2"’'P  -  1)  •  Bdec/p]-  Note,  that  (”)  for  t  K,  n  12  grows  very  fast,  and  so  the  for  the  above  range  of  the  PRF  to 
be  suitably  large  we  require  that  n  is  small.  In  our  discussion  we  implicitly  assume  n  to  be  small,  say  n  <  10. 

Recall  distributed  decryption  is  defined  by  two  algorithms  SHE.ShareDec  and  SHE.ShareCombine.  These  are 
defined  by  the  following  procedures: 

-  SHE.ShareDeCdki((c,  1))  — >  pp. 

1.  Apply  SHE.LowerLevelek((c,  1),  0)  to  obtain  the  ciphertext  (cq,  Ci)  at  level  zero  (unless  c  is  already  at  level 
zero). 

2.  Compute  pi  =  [p]i  =  [cq  —  sk  •  ci]i  =  cq  —  sk^  •  ci  where  the  computation  is  in  Rgg. 

3.  Execute  the  PRSS,  using  the  PRF  keys  k^,  to  obtain  a  Shamir’s  share  of  a  “random”  value  r  G  Rg^  such 

that  r  =  '*/'kA(t)  and  ||r||oo  <  —  1)  •  Bdec)/p,  for  some  agreed  vector  of  values  t  which  are  a 

function  of  the  input  ciphertext  c. 

4.  Compute  pi  =  [p]i  =  [p  +  p  ■  r]i  =  pi  +  p  ■  Vi  and  output  pi  as  the  decryption  share. 

-  SHE.ShareCombine((c,  I),  — >  m':  given  a  set  n  of  decryption  shares  pi  and  (in  the  malicious 

setting)  an  error  correction  procedure,  reconstruct  p  =  p  +  p  ■  r  hy  applying  the  error  correction  procedure  to 
{pi}ie[i...,n]  and  output  m'  =  {p  mod  p). 

Note  decryption  will  work  as  long  as  the  reconstructed  value  p  is  less  than  qo/2,  i.e.  we  require  •  Sjec  <  9o/2 
(see  the  next  section  for  details). 

We  pause  to  note  the  different  situations  where  one  obtains  correct  message  recovery  from  SHE.ShareCombine. 
In  the  case  of  passive  adversaries  we  will  show  (in  the  next  section)  that  the  above  distributed  decryption  procedure 
is  secure  as  long  as  f  <  n.  Since  we  are  using  Shamir  sharing,  in  the  presence  of  f  <  n/3  active  corruptions,  using 
the  natural  error  correction  properties  (namely  Reed-Solomon  (RS)  error  correction),  we  can  correctly  recover  the 
message  at  the  end  of  SHE.ShareCombine. 

When  t  <  n/2  a  little  more  work  is  involved;  if  an  adversary  sends  an  incorrect  share  then  this  can  be  detected, 
again  because  we  are  using  Shamir  as  the  underlying  secret  sharing  scheme.  At  this  point  the  parties  execute  a  party 
elimination  strategy  in  which  they  require  each  other  to  prove  in  zero-knowledge  that  the  provided  share  is  correct. 
Once  the  cheater  party(s)  have  been  determined  they  are  eliminated  from  the  protocol  and  the  protocol  resumes.  Thus 
for  active  adversaries  and  t  <  n/2  we  may  require  a  grand  total  of  an  extra  n^  •  t  zero-knowledge  proofs  to  be 
constructed,  irrespective  of  the  size  of  the  circuit  in  our  main  protocol;  see  Section  6  for  more  details. 

7.5  Security  of  Our  Threshold  BGV  Instantiation 

Recall  from  earlier  we  require  four  security  properties: 

-  Key  Generation  Security. 

-  Semantic  Security. 

-  Correct  Share  Decryption. 

-  Share  Simulation  Indistinguishability. 

We  now  discuss  each  of  these  in  turn. 

Key  Generation  Security:  The  required  properties  of  the  keys  produced  by  the  key  generation  algorithm  follow  from 
the  security  properties  of  the  Shamir  secret  sharing  scheme  used  to  share  sk.  We  note  in  our  main  protocol  we  assume 
an  ideal  functionality  to  distribute  such  keys,  and  so  there  is  no  “Key  Generation”  protocol  to  analyse. 

Semantic  Security:  The  follows  from  the  standard  semantic  security  of  the  BGV  scheme.  However,  we  need  to  deal 
with  the  fact  that  the  adversary  has  access  to  shares  of  the  underlying  secret  key  and  the  keys  to  the  PRSS.  A  standard 
simulation  shows  that  security  in  our  setting  reduces  to  that  in  the  standard  setting. 

Correct  Share  Decryption:  The  infinity  norm  of  the  element  p  =  p-l-p-r  produced  by  the  algorithm  SHE.ShareCombine 
is  bounded  by  B^ec  +  P  ■  (2®’''’  —  1)  •  B^ec/p  ~  2®’'p  •  Bdec-  If  2®^'^  •  i?dec  <  9o/2  then  correct  decryption  will  result. 

Share  Simulation  Indistinguishability:  We  need  to  present  a  PPT  algorithm  (simulator)  SHE.ShareSim  which  when 
given  a  ciphertext  c  G  Rg^  with  associated  plaintext  m  G  Rp,  a  subset  /  C  {1, . . . ,  n}  such  that  |/|  =  t,  and  a  set 
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of  t  decryption  shares  {/Xijig/,  where  =  SHE.ShareDeCdki((c,  I)),  can  simulate  the  remaining  (n  —  t)  decryption 
shares  {/r^}  in  such  a  way  that  the  following  two  following  distributions  are  statistically  indistinguishable: 


j 


jei) 


where  I  =  {1, . . .  ,n}\I.  i.e.  one  cannot  distinguish  the  real  shares  for  the  set  I  (as  computed  by  SHE.ShareDec  algo¬ 
rithm)  with  ones  produced  by  the  simulator.  Moreover,  we  require  the  statistical  distance  between  the  two  distributions 
to  be  bounded  by  The  simulator  is  constructed  as  follows: 

1.  Let  denote  the  set  of  keys  for  the  PRSS  that  have  been  given  to  parties  Pi  where  i  G  I,  and  let  denote 
the  set  of  keys  for  the  PRSS  held  by  Pj,  for  j  G  I. 

2.  The  simulator  hrst  computes 

r'  =  ^k(t)  + 

kek(i^’ 

where  each  r\^  G  Rq^  is  chosen  such  that 


1 1  ^k  1 1  oo  ^ 


(2-P  -  1)  •  Bdec 

p  ■  (3 


In  this  way  ||r'||oo  < 

3.  Let  jl*  =  m  +  p  ■  r' .  For  each  j  G  I,  the  simulator  outputs  fi*  such  that  is  a  consistent  vector 

of  shares  of  p,*;  i.e.  the  simulator  deterministically  computes  consistent  shares  for  the  honest  parties  via  Lagrange 
interpolation  of  the  tpl  values,  p*  and  {pi]i^i. 


Before  proving  the  properties  of  the  simulation,  we  recall  the  following  lemma  from  [2]: 

Lemma  1  (Smudging  Lemma  [2]).  Let  Bi  and  B2  be  positive  integers  and  let  ei  G  [—Bi,  Bi]  be  a  fixed  integer  and 
let  62  G  [—B2,  B2]  be  chosen  uniformly  and  randomly.  Then  the  statistical  distance  between  the  distribution  of  62  and 
62  +  61  is  B1/B2. 

To  prove  the  properties  of  the  simulation,  we  hrst  note  that  similar  to  the  last  stage  of  the  simulation  above,  the 
real  shares  for  the  honest  parties  can  be  constructed  (deterministically)  from  p  and  the  shares  held  by  the  t  dishonest 
parties.  Thus,  to  prove  indistinguishability  of  the  real  and  simulated  shares,  it  suffices  to  prove  that  p*  —  m  +  p-r' 
and  p  =  p  +  p  ■  r  cwe  statistically  close"*.  To  see  this  is  indeed  the  case,  we  hrst  note  that  p  +  p  ■  r  and  p  +  p  ■  r' 
are  indistinguishable  (by  construction)  and  that  r'  is  uniform  in  an  exponentially  larger  range  than  p  (recall  that 
IImIIoo  <  -Bdec  and  ||r'||oo  <  — — By  application  of  the  Smudging  lemma,  the  statistical  distance  between  the 
distribution  of  p  +  p  ■  r'  and  the  uniform  distribution  of  polynomials  with  coefficients  in  [— (2®^p  —  1)  •  B^ec,  (2®’'p  — 
1)  •  Sdec]  is  exactly  A^/(2®^p  —  1). 

To  conclude  the  proof,  we  next  claim  that  the  distribution  of  m  +  p  •  r'  is  statistically  indistinguishable  from  the 
uniform  distribution  of  polynomials  with  coefficients  [— (2®’'p  —  1)  •  Sdeo  (2®^p  —  1)  •  i?dec]-  This  follows  from  the 
fact  that  the  statistical  distance  between  the  two  distributions  is  .^2^p-i)  (which  itself  follows  from  the  Smudging 
Lemma  and  the  fact  that  m  G  Rp).  It  follows  from  the  triangle  inequality  that  the  overall  statistical  distance  between 
the  distribution  of  p*  =  m  +  p  ■  r'  and  p  =  p  +  p  ■  r  is  upper  bounded  by  Choose 


exp  =  sec  +  max 


/  N  ■  {p+  Bdec) 
V  ^dec 


^Og2{N)]. 


Since  p  <  B^ec  this  simplihes  to  exp  =  sec  +  log2(iV)  +  1,  and  we  can  therefore  ensure  the  statistical  distance  is 
bounded  by  2“^®^  which  can  be  made  arbitrarily  small  by  our  choice  of  exp. 

*  For  statistically  close  distributions  X  x  Y  and  any  deterministic  procedure  A  applied  to  those  distributions  it  is  the  case  that 
Gl(X)  «  A{Y). 
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7.6  Batch  Distributed  Decryption 

Using  a  well  known  technique  presented  in  [4, 24],  we  can  perform  a  batch  of  f  +  1  =  0{n)  distributed  decryption, 
and  hence  evaluate  a  batch  of  f  + 1  Refresh  gates  at  the  communication  cost  of  performing  two  distributed  decryptions. 
The  following  technique  applies  to  our  main  MFC  protocol  if  the  batch  of  refresh  gates  are  independent,  meaning  the 
output  wire  of  one  does  not  lead  to  the  input  of  the  other. 

Given  a  value  shared  among  the  parties,  its  public  reconstruction  requires  each  party  to  send  the  share  (of  the  value) 
it  holds  to  every  other  party.  This  requires  n  •  (n  —  1)  pair-wise  communication  of  shares.  So  for  f  +  1  shared  values, 
the  public  reconstruction  will  require  0{n^)  pair-wise  communication  of  shares.  In  what  follows,  it  is  shown  how  the 
above  can  be  achieved  with  the  same  cost  of  public  reconstruction  of  a  single  value,  namely  with  a  communication  of 
2  ■  n  ■  {n  —  1)  =  0{n'^)  shares.  The  idea  was  used  in  the  information  theoretically  secure  MFC  protocols  of  [4, 24]. 

Let  . . . ,  be  f  -f  1  shared  values.  First  the  t  +  1  shared  values  are  “expanded”  to  n  shared  values,  say 

. . . ,  by  applying  a  linear  function  locally.  Specifically,  if  the  underlying  LSS  is  Shamir,  then  we  can  interpret 
. . . ,  as  the  coefficients  of  a  polynomial  of  degree  at  most  t,  say  u{-)  and  let  . . . ,  be  the  n  distinct 

points  on  this  polynomial.  Now  notice  that  obtaining  from  is  a  linear  function  and 

by  (locally)  applying  the  same  linear  function  on  the  sharings  of  the  parties  can  obtain  sharings  of 

. . . ,  Now  each  is  reconstructed  only  to  Pi  and  this  costs  0{n^)  communication  of  shares.  Finally  every 

Pi  sends  to  every  other  party  (which  costs  another  0{n^)  communication)  and  then  every  party  can  reconstruct 
u{-)  and  hence  . . . , 

In  our  setting  all  of  the  above  sharing  is  done  using  Shamir  over  the  ring  Rq^ .  It  is  easy  to  see  that  the  above  can  be 
carried  out  with  no  change  to  the  underlying  SHE  scheme.  Thus  assuming  our  initial  circuit  is  large  enough,  i.e.  there 
are  enough  independent  Refresh  gates  at  each  level,  we  can  obtain  a  performance  improvement  of  {t  +  l)/2. 


8  Parameter  Calculation 

In  [31]  a  concrete  set  of  parameters  for  the  BGV  SHE  scheme  was  given  for  the  case  of  binary  message  spaces,  and 
arbitrary  L.  In  [22]  this  was  adapted  to  the  case  of  message  space  Rp  for  2-power  cyclotomic  rings,  but  only  for  the 
schemes  which  could  support  one  level  of  multiplication  gates  (i.e.  for  L  =  1).  In  this  section  we  combine  these 
analyses  to  produce  parameter  estimations  for  the  case  we  require  of  arbitrary  L  and  messages  defined  by  a  “large 
prime”,  e.g.  p  «  2^^,  2®"*  or  2^^®.  We  assume  in  this  section  that  the  reader  is  familiar  with  the  analysis  and  algorithms 
from  [31];  we  mainly  point  out  the  differences  in  estimates  for  our  case. 

Our  analysis  will  make  extensive  use  of  the  following  fact:  If  a  €  i?  be  chosen  from  a  distribution  such  that  the 
coefficients  are  distributed  with  mean  zero  and  standard  deviation  a,  then  if  C,rn  is  a  primitive  mth  root  of  unity,  we 
can  use  6  •  cr  to  bound  a((m)  and  hence  the  canonical  embedding  norm  of  a.  If  we  have  two  elements  with  variances 
ai  and  (j|,  then  we  can  bound  the  canonical  norm  of  their  product  with  16  •  cti  •  (T2. 

Recall  from  Section  7  that  we  require  a  chain  of  moduli  go  <  qi  ...<  gL  corresponding  to  each  level  of  the 
scheme,  where  gL  =  Qo  '  TliZi  Pi-  Note  that  we  evaluate  a  depth  L  circuit  from  a  chain  of  L  -f  1  moduli.  Also  note, 
that  we  apply  a  SHE.  Lower  Level  (a.k.a.  modulus  switch)  algorithm  before  a  multiplication  operation,  except  when 
multiplying  at  level  one.  This  often  leads  to  lower  noise  values  in  practice  (which  a  practical  instantiation  can  make 
use  of).  In  addition  it  eliminates  the  need  to  perform  a  modulus  switch  after  encryption. 

We  utilize  the  following  constants  described  in  [22],  which  are  worked  out  for  the  case  of  message  space  defined 
modulo  p  (the  constants  in  [22]  make  use  of  an  additional  parameter  n,  arising  from  the  key  generation  procedure.  In 
our  case  we  can  take  this  constant  equal  to  one). 

Sciean  =N  ■  p/2  +  p  ■  a  ■  ^  ®  h  ■  N 

-BScale  =P  ■  V3  •  N  ■ 

Bks  =P  •  cr  •  •  ^1.49  •  V/i-iV  -L  2.11  •  h  +  5.54  •  y'h  +  IMVn  +  4.62 
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The  constants  are  used  in  the  following  manner:  A  freshly  encrypted  ciphertext  at  level  L  has  noise  bounded  by  i?ciean- 
In  the  worst  case,  when  applying  SHE.LowerLevel  to  a  ciphertext  at  level  1  with  noise  bounded  by  B'  one  obtains  a 
new  ciphertext  at  level  1  —  1  with  noise  bounded  by 


B' 

H”  -^Scale- 

Pi 

When  applying  the  tensor  product  multiplication  operation  to  ciphertexts  of  a  given  level  1  of  noise  Bi  and  B2  one 
obtains  a  new  ciphertext  with  noise  given  by 


Bw<-  •  Ql 

Bi-B2+  +  Bscale, 

where  P  is  a  value  to  be  determined  later.  As  in  [31]  we  define  a  small  “wiggle  room”  ^  which  we  set  to  be  equal  to 
eight;  this  is  set  to  enable  a  number  of  additions  to  be  performed  without  needing  to  individually  account  for  them  in 
our  analysis. 


A  general  evaluation  procedure  begins  with  a  freshly  encrypted  ciphertext  at  level  L  with  noise  Pciean-  When  en¬ 
tering  the  first  multiplication  operation  we  first  apply  a  SHE.LowerLevel  operation  to  reduce  the  noise  to  our  universal 
bound  B.  We  therefore  require 


^  '  .^Clean 
PL 


“L  .^Scale 


<B, 


i.e. 


PL  > 


8  •  Pciean 
B  —  Pscale 


(1) 


We  now  turn  to  dealing  with  the  SHE.LowerLevel  operation  which  occurs  before  a  multiplication  gate  at  level 
1  G  [2, . . . ,  L  —  1],  We  perform  a  worst  case  analysis  and  assume  that  the  input  ciphertexts  are  at  level  1  —  1.  We 
can  then  assume  that  the  input  to  the  tensoring  operation  in  the  previous  multiplication  gate  (just  after  the  previous 
SHE.LowerLevel  )  was  bounded  by  B,  and  so  the  output  noise  from  the  previous  multiplication  gate  for  each  input 
ciphertext  is  bounded  by  B^  +  Pks  •  qi/P  +  ^Scaie-  This  means  the  noise  on  entering  the  SHE.LowerLevel  operation 
is  bounded  by  ^  times  this  value,  and  so  to  maintain  our  invariant  we  require 


C  +  ^  ■  Pscale  ^  •  -Bks  '  gl 
Pi  P  -Pl 


+  Bsc3\e  <  B. 


Rearranging  this  into  a  quadratic  equation  in  B  we  have 


Pl  \  Pl 


C  •  •  gl-1 

p 


<  0. 


We  denote  the  constant  term  in  this  equation  by  We  now  assume  that  all  primes  pi  are  of  roughly  the  same  size, 
and  noting  the  we  need  to  only  satisfy  the  inequality  for  the  largest  modulus  [  —  L  —  1.  We  now  fix  Rl-2  by  trying  to 
ensure  that  Rl-2  is  close  to  Bscaie  •  (1  +  ^/pl-i)  «  Bscaie,  so  we  set  Rl-2  =  (1  -  2“^)  •  Sscaie(l  +  ^/pl-i),  and 
obtain 


u  a  ^  •  (?L-2 

^  ■  Bsc. 


(2) 


since  Bscale  •  (1  +  ^/pl-i)  ~  Bsc3\e- 

To  ensure  we  have  a  solution  we  require  1  —  4  •  ^  •  Rl-2/pl-i  >  0,  which  implies  we  should  take,  for  i  = 

2,...,L-1, 

Pi  «  4  •  ^  •  Rl-2  ~  32  •  Bscaie-  (3) 
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Recall  that  the  final  multiplication  is  executed  in  a  different  manner.  We  do  not  modulus  switch  before  the  multi¬ 
plication,  but  afterwards.  We  analyse  the  implication  of  this,  for  the  size  of  pi,  from  the  point  of  view  of  our  concrete 
application  to  our  MFC  protocol.  The  final  multiplication  will  be  of  a  ciphertext  with  noise 

and  a  ciphertext  with  noise  B  (namely  Ci).  The  input  to  the  final  key  switch  will  have  noise  value  approximately 
C  •  we  make  this  simplifying  assumption  which  makes  little  difference  to  the  final  values.  The  output  noise  from 
the  keyswitch  is  then  equal  to 

r-  d3  ,  D  I  •  qi 

4  •  +  fsscale  H - ^ - • 

We  then  perform  a  modulus  switch  to  obtain  a  ciphertext  as  output  of  the  multiplication  gate  with  noise  bounded  by 

^  +  Bscale  ^Ks  '  Fo  „ 

- 1 - - f  i^Scale- 

Pi  P 

We  again  require  this  to  be  less  than  B,  so  we  have  now  the  cubic  equation 


L.b^-B+(  + 

Pi  \  Pi 


Bks  •  Po 
p 


<  0. 


Substituting  in  our  existing  estimate  for  P,  namely  8  •  4  •  ■qL-2/ ^Scaie  we  find  the  inequality  is  roughly  equivalent 

to,  assuming  L  >2  and  pi  >  i?scaie  (i-e.  qL-2  >  .Bscaie  •  Po), 


L.B^-B+  +  Bscale  ^^-B^-B+i^ 

Pi  Pi  Pi  \  Pi 


.^Scale  '  Po 
8 • qL-2 


Bs, 


cale 


<  0. 


If  we  set  i?  «  2  •  Bscaie,  then  this  means  we  have  (approximately) 


—  •  8  •  B 
Pi 


3 

Scale 


-^Scale  H” 


-^Scale 

Pi 


<0, 


and  so 

Pl«8-(4+l)-i?|cale  (4) 

will  therefore  guarantee  the  result. 


We  now  need  to  estimate  the  size  of  po-  Due  to  the  above  choice  of  pi  the  ciphertext  to  which  we  apply  the 
distributed  decryption  has  norm  bound  by  B,  to  which  we  add  on  a  random  encryption  of  zero  at  level  L.  To  do  this 
we  need  to  apply  LowerLevel  to  this  encryption  of  zero,  and  hence  the  noise  level  of  the  ciphertext  we  finally  pass  into 
SHE.ShareDec  in  our  main  MFC  protocol  has  noise  bounded  by  i?dec  =  2  ■  B  This  means  that  we  require 

qo=Po>2^^‘^+^-B,  (5) 

to  ensure  a  valid  distributed  decryption. 


Finally,  set  the  Hamming  weight  h  of  the  secret  key  sk  to  be  64  as  in  [31, 22].  Flugging  this  into  our  equations  (1), 
(2),  (3),  (4),  and  (5),  we  obtain 

Po  «  309  •  2'^®'=  •  p  ■  Vn, 

Pi  «  107736  -p^-N, 

Pi  «  1237  •  p  ■  '/N,  for  2  <  f  <  L  —  1, 

Pl  ~  2.34  •  a  ■  VN, 

P  «  0.404  •  1237^  •  cr  •  2®’'P  •  p^  ■ 
qL-i  «  21.76  •  1237^  •  2^^^  ■  p^+^  ■ 
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The  largest  modulus  used  in  our  key  switching  matrices,  i.e.  the  largest  modulus  used  in  an  LWE  instance,  is  given  by 
Ql-1  =  P  •  Ql-u  where  using  the  above  estimates  we  have 

Ql-1  «  8.79  •  1257^'^  ■  a  ■  .  nL+2^ 

Recall  from  Section  7.5  we  have  the  following  relationship  between  exp  and  our  statistical  security  parameter  sec; 
exp  =  sec  +  log2(iV).  To  ensure  security  we  use  the  estimates  of  Lindner  and  Peikert  [34],  we  require  at  the  /t-bit 
security  level  we  require 

7V>  (/t+110)-log(QL_i/(T)/7.2. 

9  Estimating  the  Consumed  Bandwidth 

In  Section  8  we  determined  the  parameters  for  the  instantiation  of  our  SHE  scheme  using  BGV  by  adapting  the  analysis 
from  [22,  31].  In  this  section  we  use  this  parameter  estimation  to  show  that  our  MPC  protocol  can  in  fact  give  improved 
communication  complexity  compared  to  the  standard  MPC  protocols,  for  relatively  small  values  of  the  parameter  L. 
We  are  interested  in  the  communication  cost  of  our  online  stage  computation.  To  ease  our  exposition  we  will  focus  on 
the  passively  secure  case  from  Section  4;  the  analysis  for  the  active  security  case  with  i  <  n/3  is  exactly  the  same  (bar 
the  additional  cost  of  the  exchange  of  zero-knowledge  proofs  for  the  input  stage  and  the  output  stage).  Eor  the  case  of 
active  security  with  t  <  n/2  we  also  need  to  add  in  the  communication  related  to  the  dispute  control  strategy  outlined 
in  Section  6  for  attaining  robust  SHE.ShareCombine  with  t  <  n/2;  but  this  is  a  cost  which  is  proportional  to  0(n^). 

To  get  a  feel  for  the  parameters  from  Section  8,  we  now  specialise  to  the  case  of  finite  fields  of  size  p  «  2®^, 
statistical  security  parameter  sec  of  40,  and  for  various  values  of  the  computational  security  level  k.  Resolving  the 
various  inequalities  (from  Section  8),  we  then  estimate  in  Table  1  the  value  of  N,  assuming  a  small  value  for  n  (we 
need  to  restrict  to  small  n  to  ensure  a  large  enough  range  in  the  PRE  needed  in  the  distributed  decryption  protocol;  see 
Section  7.4). 


L 

K  =  80 

K  =  128 

K  =  256 

2 

16384 

16384 

32768 

3 

16384 

16384 

32768 

4 

16384 

32768 

32768 

5 

32768 

32768 

65536 

6 

32768 

32768 

65536 

7 

32768 

32768 

65536 

8 

32768 

65536 

65536 

9 

32768 

65536 

65536 

10 

65536 

65536 

65536 

Table  1.  The  value  of  N  for  various  values  of  k  and  L 


Since  a  Refresh  gate  requires  the  transmission  of  n  —  1  elements  (namely  the  decryption  shares)  in  the  ring 
from  party  to  the  other  parties,  the  total  communication  in  our  protocol  (in  bits)  is 

\Gr\  •  n  -  (n  -  1)  • 

where  \Rqg  \  is  the  number  of  bits  needed  to  transmit  an  element  in  Rqg,  i.e.  N  ■  log2Po-  Assuming  the  circuit  meets 
our  requirement  of  being  well  formed,  this  implies  that  total  communication  cost  for  our  protocol  is 

2.|GM|-n-(n-l)-W.log2Po  ^  2  •  n  •  (n  -  1)  •  |Gm|  .  .  2sec  .  ^ 

L  •  N  L 

Using  the  batch  distributed  decryption  technique  (of  efficiently  and  parallely  evaluating  1  independent  Refresh  gates 
simultaneously)  from  Section  7.6  this  can  be  reduced  to 

•  log2(309  •  2-^  •  p  •  ViV). 
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We  are  interested  in  the  overhead  per  multiplication  gate,  in  terms  of  equivalent  numbers  of  finite  field  elements  in 
Fp,  which  is  given  by  Cost/(|GM|  •  log2P),  and  the  cost  per  party  is  Cost/(|GM|  •  n  ■  log2p). 

At  the  128  bit  security  level,  with  p  «  2®"*,  and  sec  =  40  (along  with  the  above  estimated  values  of  N),  this  means 
for  n  =  3  parties,  and  at  most  t  =  I  corruption,  we  obtain  the  following  cost  estimates: 


L 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Total  Cost 

Cost/(|GM|  •  logaP) 

12.49 

8.33 

6.31 

5.05 

4.21 

3.61 

3.19 

2.84 

2.55 

Per  party  Cost 

Cost/(|GM|  •  n  ■  log2p) 

4.16 

2.77 

2.10 

1.68 

1.40 

1.20 

1.06 

0.94 

0.85 

Note  for  L  =  2  our  protocol  becomes  the  one  which  requires  interaction  after  every  multiplication,  for  L  =  3 
we  require  interaction  only  after  every  two  multiplications  and  so  on.  Note  that  most  practical  MPC  protocols  in  the 
preprocessing  model  have  a  per  gate  per  party  communication  cost  of  at  least  2  finite  field  elements,  e.g.  [25].  Thus, 
even  when  L  =  5,  we  obtain  better  communication  efficiency  in  the  online  phase  than  traditional  practical  protocols 
in  the  preprocessing  model  with  these  parameters. 
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Abstract.  We  propose  the  first  general  framework  for  designing  actively  secure  private  function  eval¬ 
uation  (PFE),  not  based  on  universal  circuits.  Our  framework  is  naturally  divided  into  pre-processing 
and  online  stages  and  can  be  instantiated  using  any  generic  actively  secure  multiparty  computation 
(MPC)  protocol. 

Our  framework  helps  address  the  main  open  questions  about  efficiency  of  actively  secure  PFE.  On 
the  theoretical  side,  our  framework  yields  the  first  actively  secure  PFE  with  linear  complexity  in  the 
circuit  size.  On  the  practical  side,  we  obtain  the  first  actively  secure  PEE  for  arithmetic  circuits  with 
0{g  ■  log  g)  complexity  where  g  is  the  circuit  size.  The  best  previous  construction  (of  practical  interest) 
is  based  on  an  arithmetic  universal  circuit  and  has  complexity  0{g^). 

We  also  introduce  the  first  linear  Zero-Knowledge  proof  of  correctness  of  “extended  permutation”  of 
ciphertexts  (a  generalization  of  ZK  proof  of  correct  shuffles)  which  maybe  of  independent  interest. 

Keywords.  Secure  Multi-Party  Computation,  Private  Function  Evaluation,  Malicious  Adversary,  Zero- 
Knowledge  Proof  of  Shuffle 


1  Introduction 

Private  Function  Evaluation  (PFE)  is  a  special  case  of  Multi-Party  Computation  (MPC),  where  the  parties 
compute  a  function  which  is  a  private  input  of  one  of  the  parties,  say  party  Pi .  The  key  additional  security 
requirement  is  that  all  that  should  leak  about  the  function  to  an  adversary,  who  does  not  control  Pi,  is 
the  size  of  the  circuit  (i.e.  the  number  of  gates  and  distinct  wires  within  the  circuit).  Clearly,  PFE  follows 
immediately  from  MPC  by  designing  an  MPC  functionality  which  implements  a  universal  machine/circuit; 
thus  the  only  open  questions  in  PFE  research  are  those  of  efficiency.  Using  universal  circuits  one  can  achieve 
complexity  of  0{g^)  in  case  of  arithmetic  circuits  [23]  and  0{g  ■  logg)  for  boolean  circuits  [26].  For  ease  of 
exposition  we  ignore  the  factors  depending  on  the  number  of  parties  and  the  security  parameters  as  they 
depend  on  the  particular  underlying  MPC  being  used.  We  still  provide  some  numbers  for  the  specific  SPDZ 
instantiation  in  section  5. 

A  number  of  previous  work  [1,2,4,12,14,15,16,17,22,24]  have  considered  the  design  and  implementation 
of  more  efficient  general-  and  special-purpose  private  function  evaluation.  A  major  motivation  behind  these 
solutions  (and  PFE  in  general)  is  to  hide  the  function  being  computed  since  it  is  proprietary,  private  or  con¬ 
tains  sensitive  information.  Some  applications  of  interest  considered  in  the  literature  are  software  diagnostic 
[4],  medical  applications  [2],  and  intrusion  detection  systems  [20]. 

But  all  prior  solutions  are  in  the  semi-honest  model  and  fail  in  the  presence  of  an  active  adversary 
who  does  not  follow  the  steps  of  the  protocol  (with  the  exception  of  the  generic  approach  of  applying  an 
actively  secure  MPC  to  universal  circuits).  For  example,  a  malicious  party  who  does  not  own  the  function  can 
cheat  to  learn  the  proprietary  function  or  modify  the  outcome  of  computation  without  the  function-holders’ 
knowledge.  Or  a  malicious  function-holder,  can  learn  information  about  honest  parties’  inputs. 

This  article  the  full  version  of  an  earlier  article:  Asiacrypt  2014,  (c)  lACR  2014,  http://dx.doi.org/10.1007/ 
978-3-662-45608-8_26. 
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One  may  question  the  need  for  actively  secure  PFE  as  the  function-holder  can  cheat  and  use  a  malicious 
function,  which  reveals  information  about  the  other  party’s  input.  While  we  consider  the  general  scenario 
in  our  protocols,  there  are  common  practical  scenarios  where  the  function-holder  has  no  output  in  the 
computation,  and  therefore  maliciously  changing  the  function  still  does  not  let  him  learn  anything  even  if 
he  is  actively  cheating. 

1.1  Our  Contribution 

In  this  work,  we  present  the  first  general  framework  for  designing  actively  secure  PFE,  not  based  on  universal 
circuits.  Our  framework  can  be  instantiated  upon  a  generic  actively  secure  MPC  protocol  satisfying  quite 
general  properties;  namely  that  they  are  secret  sharing  based,  actively  secure  (either  robust  or  with  aborts), 
can  implement  reactive  functionalities,  and  have  an  ability  to  open  various  sharings  securely,  as  well  as 
generate  (efficiently)  sharings  of  random  values.  Suitable  actively  secure  MPC  protocols  include  BDOZ  [3] 
and  SPDZ  [8]  (for  the  case  of  arithmetic  circuits  and  an  arbitrary  number  of  players  with  a  dishonest 
majority),  Tiny-OT  [19]  (for  binary  circuits  and  two  players),  or  protocols  such  as  that  implemented  in 
VIFF  [7]  utilizing  Shamir  secret  sharing  with  a  threshold  of  t  <  n/3. 

Our  framework  helps  address  the  main  open  questions  about  efficiency  of  actively  secure  PFE.  On  a 
theoretical  note,  we  use  it  to  show  that  actively  secure  PFE  with  linear  complexity  (in  circuit  size)  is  indeed 
feasible  while  avoiding  strong  primitives  such  as  fully-homomorphic  encryption  (FHE).^  On  a  practical  note, 
we  obtain  a  practical  actively  secure  PFE  for  arithmetic  circuit  with  0{g  ■  log;/)  complexity  (a  significant 
reduction  from  O(g^)  [23]),  and  the  first  actively  secure  PFE  in  the  information-theoretic  setting. 

Our  Framework.  Our  framework  can  be  seen  as  an  extension  of  the  new  framework  of  [17]  which  is  only 
secure  against  passive  adversaries.  The  key  idea  in  [17]  is  to  divide  the  problem  into  two  sub-problems,  the 
problem  of  hiding  the  topology  of  the  wiring  between  individual  gates  (topology  hiding),  and  the  problem  of 
hiding  exactly  what  gate  is  evaluated  (gate  hiding),  i.e.  an  addition  or  a  multiplication  (or  AND/OR/XOR 
in  case  of  boolean  circuits) . 

This  framework  yields  better  asymptotic  and  practical  efficiency  for  passively  secure  PFE  compared  to 
the  universal  circuit  approach  (see  [17]  for  a  detailed  efficiency  comparison).  An  important  open  question  is 
then  how  to  extend  their  solution  to  the  case  of  active  adversaries  efficiently.  In  this  paper  we  do  exactly  that 
by  providing  a  recipe  for  turning  any  actively  secure  MPC  protocol  that  satisfies  our  general  requirements 
into  an  actively  secure  PFE  protocol. 

Our  framework  operates  in  two  phases,  an  offline  phase  and  an  online  phase.  As  in  the  case  of  standard 
MPC  in  the  pre-processing  model,  our  offline  phase  is  input  independent  but  it  depends  on  the  function. 
The  offline  phase  is  use-once,  in  the  sense  that  the  data  produced  cannot  be  reused  for  multiple  invoca¬ 
tions  of  the  online  phase.  We  note  that  a  similar  function-dependent  pre-processing  model  (referred  to  as 
dedicated  pre-processing)  was  recently  considered  in  [9].  Dedicated  pre-processing  is  particularly  natural  in 
PFE  applications  where  the  sensitive/proprietary  function  stays  fixed  for  a  period  of  time  and  is  used  in 
multiple  executions  (clearly  in  the  latter  case  we  need  to  execute  the  pre-processing  multiple  times,  but  this 
can  be  done  in  advance) .  Of  course,  if  one  is  not  willing  to  count  a  function-dependent  offline  phase  as  valid, 
then  our  complexities  would  be  the  combination  of  the  two  phases.  It  maybe  the  case  that  our  underlying 
MPC  protocol  is  itself  in  the  pre-processing  model  (e.g.  [3,8,19]),  in  which  case  that  pre-processing  will  be 
essentially  independent  of  the  input  and  function  being  evaluated.  Our  framework  shows  the  feasibility  of 
offline  computation  independent  of  inputs,  which  was  not  the  case  in  [17].  We  elaborate  on  the  two  phases 
next: 

Offline  Phase.  Roughly  speaking,  our  offline  phase  generates  two  vectors  of  random  values,  maps  the  second 
to  a  new  vector  using  a  mapping  that  captures  the  topology  of  the  circuit  (referred  to  as  extended  permutation 

*  Note  that  with  the  use  of  the  right  circuit-private  FHE  scheme  [21],  and  appropriate  ZK  proofs  for  correctness  of 
the  computation  on  encrypted  data,  it  is  likely  possible  to  achieve  linear  PFE  based  on  FHE,  but  we  are  interested 
in  the  use  of  much  weaker  primitives  such  as  singly  homomorphic  encryption. 
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in  [17]),  and  subtracts  the  result  from  the  first.  The  result  of  the  subtraction  (difference  vector)  is  opened 
while  the  two  original  vectors  are  shared  among  the  parties.  The  two  random  vectors  are  used  as  one-time 
pads  of  all  the  intermediate  values  in  the  circuit,  while  the  “difference  vector”  is  used  by  the  function-holder 
to  connect  the  output  of  one  gate  to  the  input  of  another  without  learning  the  values  or  revealing  the  circuit 
topology.  The  offline  phase  also  generates  one-time  MACs  of  all  the  components  of  the  “difference  vector” 
computed  above,  using  a  fixed  global  MAC  key.  These  MACs  are  used  to  check  the  function-holder’s  work 
in  the  online  phase  of  the  protocol.  These  steps  commit  Pi  privately  to  the  topology  of  the  circuit.  We  also 
privately  commit  Pi  to  gate  types,  hence  fully  committing  him  to  the  function  being  computed. 

Online  Phase.  Our  online,  or  circuit  evaluation,  phase  is  very  distinct  from  that  deployed  in  the  underlying 
MFC  protocol  we  use.  In  existing  instantiations  of  our  underlying  MFC  protocol,  parties  evaluate  gates 
on  values  whose  secrecy  is  maintained  due  to  the  fact  that  one  is  working  on  secret  shared  values  only.  In 
our  protocol  the  parties  have  public  one-time  pad  encryptions  of  the  values  being  computed  on,  but  the 
encryption  keys,  which  are  the  random  values  generated  in  the  offline  phase,  remain  secret-shared.  Farty  Pi 
(the  function  holder)  then  uses  the  random  vectors  computed  in  the  offline  phase  to  transform  the  encrypted 
output  of  one  gate  to  the  encrypted  input  of  the  upcoming  gate  while  maintaining  one-time  MACs  of  all 
the  values  he  computes.  These  MACs  allow  all  other  parties  to  check  Pi’s  work  without  learning  the  circuit 
topology.  These  operations  are  carried  out  securely  using  the  underlying  MFC  protocol. 

In  both  the  online  and  the  offline  phase,  all  parties  check  Pi’s  work  by  checking  the  MACs  of  the  values 
he  computes  locally.  If  any  of  the  MACs  fail,  in  case  of  security  with  abort,  parties  can  simply  end  the 
protocol.  But  in  case  of  robust  MFC  (e.g.  t  <  n/3  for  robust  information  theoretically  secure  protocols)  the 
protocol  needs  to  continue  without  Pi.  To  achieve  this,  honest  parties  jointly  recover  Pi’s  function  and  play 
his  role  in  the  remainder  of  the  protocol. 

In  our  protocols,  if  any  adversary  deviates  from  the  protocol  then,  except  with  negligible  probability, 
the  honest  parties  will  either  abort,  or  be  able  to  recover  from  the  introduced  error.  The  exact  response 
depends  on  the  underlying  MFC  protocol  on  which  our  FFE  protocol  is  built.  In  all  cases  the  privacy  of 
the  honest  players  inputs  is  preserved,  bar  what  can  be  obtained  from  the  output  of  the  private  function 
chosen  by  player  Pi.  Note  that  Pi  may  or  may  not  be  a  recipient  of  output,  but  many  application  of  FFE 
are  concerned  with  scenarios  where  the  function-holder  has  no  output. 


Efficient  Instantiations.  One  can  efficiently  instantiate  our  online  phase  with  a  linear  complexity,  using  any 
actively  secure  MFC  satisfying  our  requirements.  The  main  challenge,  therefore,  lies  in  efficient  instantiation 
of  the  offline  phase.  It  is  possible  to  implement  our  offline  phase  using  any  actively  secure  MFC  sub-protocol  as 
well  (by  securely  computing  a  circuit  that  performs  the  above  mentioned  task)  but  the  resulting  constructions 
would  neither  be  linear  nor  constant-round. 

—  We  introduce  a  instantiation  with  0{g)  complexity,  proving  the  feasibility  of  linear  actively  secure  FFE 
for  the  first  time.  Our  main  new  technical  ingredient  is  a  linear  zero-knowledge  (ZK)  proof  of  “correct 
extended  permutation”  of  ElGamal  ciphertexts.  While  linear  ZK  proofs  of  shuffles  are  well-studied,  it 
is  not  clear  how  to  extend  the  techniques  to  extended  permutation  (see  our  incomplete  attempt  in 
Appendix  B)  Instead,  we  propose  a  generic  and  linear  solution  that  uses  ZK  proof  of  a  correct  shuffle 
in  a  black-box  manner,  and  may  be  of  independent  interest.  Our  solution  is  based  on  the  switching 
network  construction  of  EF  [17].  This  construction  consists  of  three  components,  two  of  which  are 
permutation  networks.  Instead  of  evaluating  switches,  we  use  singly  homomorphic  encryption  to  evaluate 
each  component,  and  then  re-randomize.  We  use  existing  ZK  proofs  of  shuffle  to  prove  the  correctness 
of  first  and  third  components  which  perform  permutation.  The  middle  component  requires  a  separate 
compilation  of  ZK  protocols.  Note  that  generically  applying  ZK  proofs  to  UC  circuit  evaluation  does 
not  provide  a  linear  solution,  and  applying  ZK  proofs  for  the  EF  component  also  does  not  work.  Our 
customized  linear  Z/Cep  gets  around  these  problems. 

—  We  introduce  a  constont-rownd instantiation  with  0{g -log g)  complexity  (contrast  with  0{g^)  complexity 
for  universal  arithmetic  circuits)  that  is  also  of  practical  interest.  Our  technique  is  itself  an  extension  of 
ideas  from  [17].  In  particular  the  basic  algorithm  is  that  of  [17]  for  oblivious  evaluation  of  a  switching 
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network,  but  some  care  needs  to  be  taken  to  make  sure  the  protocol  is  actively  secure.  This  is  done  by 
applying  MACs  to  the  data  being  computed  on.  However,  instead  of  having  the  MAC  values  being  secret 
shared  (as  in  SPDZ)  or  kept  secret  (as  in  BDOZ  and  Tiny-OT),  the  MAC  values  are  public  with  the 
keys  remaining  secret  shared.  Nevertheless,  the  MACs  used  are  very  similar  to  those  used  in  the  BDOZ 
and  Tiny-OT  protocols  [3,19],  since  they  are  two-key  MACs  in  which  one  key  is  a  per  message  key  and 
one  is  a  global  key.  While  using  MAC’s  is  quite  standard  for  ensuring  consistency  of  data,  our  efficient 
deployment  in  the  framework  is  non-trivial  and  novel.  For  example,  while  addition  of  MACs  in  the  offline 
phase  is  done  using  a  generic  MFC,  the  circuit  evaluation  (online  phase)  does  not  use  an  MFC.  This  is 
different  from  [17]’s  approach  and  previous  MPC  work.  General  active  security  techniques  can  not  be 
directly  employed  in  this  context.  It  is  not  clear  how  to  use  cut-and-choose  in  case  of  PFE,  e.g.  it  is  not 
clear  how  not  to  reveal  the  function  in  the  opening,  and  there  are  additional  components  (i.e.  EP)  in  a 
PFE  protocol  which  cut-and-choose  does  not  seem  to  resolve. 

Efficiency  Discussion.  We  emphasize  that  our  linear  complexity  solution  is  a  feasibility  result  at  it  was 
an  open  question  whether  active  PFE  with  linear  complexity  in  circuit  size  is  possible  given  simple  crypto 
primitive  such  as  singly  homomorphic  encryption  (as  opposed  FHE).  Our  “efficient”  arithmetic  PFE  only 
requires  0{g\ogg)  multiplication  gates  and  it  is  a  significant  improvement  in  comparison  with  applying 
of  arithmetic  MPC  to  universal  arithmetic  circuit  of  size  0{g^)  [23].  If  we  apply  active  secure  MPC  for 
arithmetic  circuits  to  this  universal  circuit  the  complexity  cannot  get  better  than  0{g^).  One  can  turn  an 
arithmetic  circuit  into  a  boolean  circuit  and  use  Valiant’s  boolean  UC  [26]  to  obtain  a  PFE.  But  this  is 
highly  inefficient,  and  therefore  we  do  not  discuss  this  in  detail. 

2  Notation  and  the  Underlying  MPC  Protocol 

We  assume  our  function  /  to  be  evaluated  will  eventually  be  given  by  player  Pi  as  an  arithmetic  circuit  over 
a  finite  field  F^;  note  p  may  not  necessarily  be  prime.  We  let  g(/)  denote  the  number  of  gates  in  the  circuit 
representing  /.  For  gates  with  fan-out  greater  than  one,  we  count  each  seperate  output  wire  as  a  different 
wire.  We  also  select  a  value  k  such  that  >  2^®'^,  where  sec  is  the  security  parameter;  this  is  to  ensure 
security  of  our  MAC  checking  procedure  in  the  online  phase. 

We  assume  n  parties  Pi,...,P„,  of  which  an  adversary  may  corrupt  (statically)  up  to  t  of  them;  the 
value  of  t  being  dependent  on  the  specific  underlying  MPC  protocol.  The  corrupted  adversaries  could  include 
party  Pi.  The  MPC  protocol  should  implement  the  functionality  described  in  Figure  1.  This  functionality  is 
slightly  different  from  standard  MPC  functionalities  in  that  we  try  to  capture  both  the  honest  majority  and 
the  dishonest  majority  setting;  and  in  the  latter  setting  the  adversary  can  force  the  functionality  to  abort  at 
any  stage  of  the  computation  and  not  just  the  output.  We  also  introduce  another  operation  called  Cheat 
which  will  be  useful  in  what  follows. 

It  is  clear  that  modern  actively  secure  MPC  protocols  such  as  [7,8,19],  implement  this  functionality  in 
different  settings.  Thus  various  different  settings  (i.e.  different  values  of  n,  p  and  t)  will  be  able  to  be  dealt 
with  in  our  resulting  PFE  protocol  by  simply  plugging  in  a  different  underlying  MPC  protocol.  To  ease 
exposition  later  we  express  our  MPC  protocol  as  evaluating  functions  in  the  finite  field  .  Clearly  such  an 
MPC  protocol  can  be  built  out  of  one  which  evaluates  functions  over  the  base  finite  field  Fp. 

To  ease  notation  in  what  follows  we  shall  let  [varid]  denote  the  value  stored  by  the  functionality  under 
{varid,a);  and  will  write  [z]  =  [a;]  -I-  [y]  as  a  shorthand  for  calling  Add  and  [z]  =  [cc]  •  [y]  as  a  shorthand  for 
calling  Multiply.  And  by  abuse  of  notation  we  will  let  varid  denote  the  value,  x,  of  the  data  item  held  in 
location  {varid ^x). 

3  Our  Active  PFE  Framework 

In  this  section  we  describe  our  active  PFE  framework  in  detail.  We  start  by  describing  the  offline  functionality 
which  pre-processes  the  function/circuit  the  parties  want  to  compute  (Section  3.1).  Then,  in  Section  3.2, 
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Functionality  ^mpc 

The  functionality  consists  of  seven  externally  exposed  commands  Initialize,  Cheat,  Input  Data,  Random, 

Add,  Multiply,  and  Output  and  one  internal  subroutine  Wait. 

Initialize:  On  input  {init,p,  k,flag)  from  all  parties,  the  functionality  activates  and  stores  p  and  k;  and  a  repre¬ 
sentation  of  Fpfc.  The  value  of  flag  is  assigned  to  the  variable  dhm,  to  signal  whether  the  MFC  functionality 
should  operate  in  the  dishonest  majority  setting.  The  set  of  “valid”  players  is  initially  set  to  all  players.  In 
what  follows  we  denote  the  set  of  adversarial  players  by  A. 

Cheat:  This  is  a  command  which  takes  as  input  a  player  index  i,  it  models  the  case  of  (most)  robust  MFC 
protocols  in  the  honest  majority  case.  On  execution  the  functionality  aborts  if  dhm  is  set  to  true.  Otherwise 
the  functionality  waits  for  input  from  all  players.  If  a  majority  of  the  players  return  OK  then  the  functionality 
reveals  all  inputs  made  by  player  i,  and  player  i  is  removed  from  the  list  of  “valid”  players  (the  functionality 
continues  as  if  player  i  does  not  exist). 

Wait:  This  does  two  things  depending  on  the  value  of  dhm. 

—  If  dhm  is  set  to  true  then  it  waits  on  the  environment  to  return  a  GO / NO-  GO  decision.  If  the  environment 
returns  NO-GO  then  the  functionality  aborts. 

—  If  dhm  is  set  to  false  then  it  waits  on  the  environment.  The  environment  will  either  return  GO,  in  which 
case  it  does  nothing,  or  the  environment  returns  a  value  i  €  A,  in  which  case  Cheat(i)  is  called. 

luput  Data:  On  input  {input,  Pi,  varid,  x)  from  Pi  and  {input.  Pi,  varid,  ?)  from  all  other  parties,  with  varid  a 
fresh  identifier,  the  functionality  stores  {varid,  x).  The  functionality  then  calls  Wait. 

Random:  On  command  {random,  varid)  from  all  parties,  with  varid  a  fresh  identiher,  the  functionality  selects 
a  random  value  r  in  and  stores  {varid,  r).  The  functionality  then  calls  Wait. 

Add:  On  command  {add,  varid\,varid2,  varid f)  from  all  parties  (if  varidi,  varid2  are  present  in  memory  and 
vand^  is  not),  the  functionality  retrieves  {varidi,  x),  {varid2,y)  and  stores  {varid3,x  -\-y).  The  functionality 
then  calls  Wait. 

Multiply:  On  input  {multiply ,  varidi,  varid2,  varid f)  from  all  parties  (if  varidi,  varid2  are  present  in  memory 
and  varid^  is  not),  the  functionality  retrieves  {varidi,  x),  {varid2,y)  and  stores  {varid^,  x  ■  y).  The  function¬ 
ality  then  calls  Wait. 

Output:  On  input  {output,  varid)  from  all  honest  parties  (if  varid  is  present  in  memory),  the  functionality 
retrieves  {varid,  x)  and  outputs  it  to  the  environment.  The  functionality  then  calls  Wait,  and  only  if  Wait 
does  not  abort  then  it  outputs  x  to  all  players. 

Fig.  1:  The  required  ideal  functionality  for  MFC 
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we  show  that  given  a  secure  implementation  of  I^offline,  one  can  efficiently  (linear  complexity)  construct  an 
actively  secure  PFE  based  on  any  actively  secure  MPC.  We  postpone  efficient  instantiations  of  IFqffline  to 
later  sections. 

3.1  The  Function  Pre-Processing  (Offline)  Phase 

In  this  section  we  detail  the  requirements  of  our  pre-processing  step  once  player  Pi  has  decided  on  the 
function  /  to  be  evaluated.  Pi  is  only  required  to  enter  a  valid  circuit,  equivalent  to  his  function  /  into  the 
protocol.  Each  non-output  wire  w  in  the  circuit  is  connected  at  one  end  (which  we  shall  call  the  outgoing 
wire  or  left  point)  to  a  source,  this  is  either  the  output  of  a  (non-output)  gate  or  an  input  wire.  Conversely 
each  non-output  wire  is  connected  at  the  other  end  (which  we  shall  call  the  incoming  wire  or  right  point)  to 
a  destination  point  which  is  always  an  input  to  a  gate.  We  denote  the  number  of  distinct  Incoming  Wires  on 
the  right  by  iw(/).  We  let  ow(/)  denote  the  number  of  Outgoing  Wires  on  the  left.  Note  that  iw(/)  =  2g  and 
ow(/)  =  n  +  g  —  o  where  o  is  the  number  of  output  gates  in  the  circuit.  Since  we  are  dealing  with  arbitrary 
fan  out  we  have  that  ow(/)  <  iw(/). 

Functionality  JFoffline 

Initialize:  As  for  JFmpc. 

Wait:  As  for  JFmpc. 

Input  Data:  As  for  J-mpc- 
Cheat:  As  for  J-mpc- 
Random:  As  for  J-mpc- 
Add:  As  for  J-mpc- 
Multiply:  As  for  J-mpc - 
Output:  As  for  JF^pc- 

Input  Function:  On  input  {input function,  tt,  f)  from  player  Pi  the  functionality  performs  the  following  oper¬ 
ations 

—  The  functionality  calls  {random,  K). 

—  If  /  is  not  a  valid  arithmetic  circuit  then  the  functionality  aborts. 

—  For  i  £  {!,...,  iw(/)}  the  functionality  calls  {random,  rt)  and  {random,  Si). 

—  For  j  £  {1, . . .  ,ow(/)}  the  functionality  calls  {random,  Ij)  and  {random,  tj). 

—  The  functionality  then  computes,  for  all  i  £  {1, . . . ,  iw(/)} 

[pi]  =  [n]  -  [4(i)],  [qi]  =  ([s*]  -  K{i)])  +  ([n]  -  [4(i)])  •  [K] 

—  The  functionality  then  outputs  {pi,qi)  to  all  players,  for  i  £  {1, . . . ,  iw(/)},  by  calling  {output, pi)  and 
{output,  qi). 

—  For  i  £  {1, . . . ,  p}  the  functionality  calls  {input,  Pi,Gi,  0)  if  gate  i  in  the  description  of  /  is  an  addition 
gate,  and  {input,  Pi,Gi,  f)  if  gate  i  is  a  multiplication  gate. 

Fig.  2:  The  required  ideal  functionality  for  the  Offline  Phase 


To  fully  capture  the  topology  of  the  circuit  we  give  each  outgoing  wire  and  incoming  wire  in  the  circuit 
a  unique  label.  The  labels  for  the  outgoing  wires  will  be  {1, . . .  ,ow(/)}  starting  from  the  input  wires  and 
then  moving  to  the  output  wires  of  each  gate  in  a  topological  order  decided  by  Pi ,  whilst  the  labels  for  the 
incoming  wires  will  be  {1, . . . ,  iw(/)}  labelling  the  input  wires  to  each  gate  in  the  same  topological  order. 
The  topology  is  then  defined  by  a  mapping  from  outgoing  wires  to  incoming  wires  and  is  called  an  “extended 
permutation”  in  [17]as  demonstrated  in  Figure  3.  We  denote  the  inverse  of  this  mapping  by  a  function  tt 
from  {1, . . . ,  iw(/)}  onto  {1, . . .  ,ow(/)}.  If  w  is  a  wire  in  the  circuit  with  incoming  wire  label  i,  then  it’s 
outgoing  wire  label  is  given  by  j  =  7r(i). 

To  execute  the  function  pre-processing,  player  Pi  on  input  of  /  determines  a  mapping  tt  corresponding 
to  /.  The  offline  phase  functionality  Poffline  which  is  described  in  Figure  2,  extends  the  Pmpc  functionality 
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Circuit  C 


CTH 


OWi  =  Xi 


OW2  =  X2 


=■  To 


=  Xq 


OW3  =  X3 
OW4  =  X4 


OW5  =  X5 


n  = 


g-  0  = 


2s  =  8 


Fig.  3:  An  example  circuit  and  the  corresponding  mapping  [17] 


of  Figure  1  by  adding  an  additional  operation  Input  Function.  The  Input  Ftinction  generates  a  vector 
of  random  (but  correlated)  values  and  their  one-time  MACs  using  a  fixed  global  MAC  key  K.  In  particular, 
the  functionality  first  stores  a  vector  of  random  values  (r^)  for  each  incoming  wire  and  another  vector  of 
random  values  {^i)  for  the  outgoing  wires  in  the  circuit.  These  random  values  will  play  the  role  of  “pads”  for 
one-time  encryption  of  the  computed  wire  values  in  the  online  phase.  The  functionality  then  computes  pi, 
the  difference  between  each  outgoing  wire’s  value  and  the  corresponding  incoming  wires’  value  and 
reveals  pi  to  all  parties.  This  difference  vector  will  allow  Pi  to  maintain  one-time  encryption  of  each  wire 
value  in  the  online  phase  without  revealing  the  circuit  topology.  Additional  random  values  (si,  ti)  and  the 
global  MAC  key  K  are  used  to  compute  one-time  MACs  of  each  namely  qi.  These  MACs  will  be  used  to 
check  Pi’s  actions  in  the  online  phase.  The  Input  Function  also  commits  Pi  to  the  function  of  each  gate 
in  his  circuit  by  storing  a  bit  (0  for  addition  and  1  for  multiplication)  for  each  gate. 

3.2  The  Function  Evaluation  (Online)  Phase 

We  can  now  present  our  framework  for  actively  secure  PFE.  We  wish  to  implement  the  functionality  in 
Figure  4.  We  express  the  functionality  as  evaluating  a  function  /  provided  by  Pi  which  takes  as  input  n 
inputs  in  Fpfc,  one  from  each  player.  Again  we  present  the  functionality  in  both  the  honest  majority  and  the 
dishonest  majority  settings. 

Realizing  .Poniine  Given  iPoffline  and  JPmpc  A  generic  instantiation  of  Pqffline  based  on  any  MPC 
is  give  in  Figure  6.  The  idea  is  to  work  with  one-time  pad  encryptions  of  the  values  for  all  intermediate 
wires  and  the  corresponding  one-time  MACs.  Here,  the  pads  values),  as  well  as  the  MAC  Key  K 

are  generated  by  the  offline  functionality,  and  shared  among  the  parties  so  no  party  can  learn  intermediate 
values  or  forge  MACs  on  his  own. 

In  more  detail,  the  protocol  proceeds  as  follows.  Initially,  parties  compute  one-time  encryption  of  the 
input  values  to  the  circuit  (pads  are  the  corresponding  £  values).  Then,  the  following  process  is  repeated  for 
every  gate  in  the  circuit  until  every  gate  is  processed.  Parties  then  open  the  outcome  of  the  output  gates  as 
their  final  result. 

For  each  gate,  party  Pi  uses  the  “difference  vectors”  {pi  values)  from  the  offline  phase  to  transform  the 
one-time  encryption  of  output  of  the  previous  gate  to  the  one-time  encryption  of  input  of  the  current  gate 
(the  result  is  denoted  by  dig,di^  for  the  i-th  gate.),  without  revealing  the  topology  or  learning  the  actual 
wire  values.  This  is  diagrammatically  presented  in  Figure  5  to  aid  the  reader.  A  similar  transformation  is 
done  on  MACs  of  the  wire  values  (using  qi  values)  in  order  to  keep  Pi  honest  in  his  computation  (denoted 
by  m,g,m,^). 

Then,  the  protocol  proceeds  by  jointly  removing  the  one-time  pads  for  the  two  inputs  of  the  current  gate 
and  evaluating  it  together  in  order  to  compute  a  shared  output  Zi.  Note  that  in  this  gate  evaluation  the  gate 
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Functionality  ^online 

Initialize:  On  input  flag)  from  all  players,  the  functionality  activates  and  stores  p  and  k;  and  a 

representation  of  The  value  of  flag  is  assigned  to  the  variable  dhm,  to  signal  whether  the  underlying 
MFC  functionality  should  operate  in  the  dishonest  majority  setting. 

Wait:  If  dhm  is  set  to  false  then  this  does  nothing.  Otherwise  it  waits  on  the  environment  to  return  a 
GO  j NO-GO  decision.  If  the  environment  returns  NO-GO  then  the  functionality  aborts. 

Input  Function:  On  input  [inputf unction,  f)  from  player  P\  the  functionality  stores  (function,/).  The  func¬ 
tionality  now  calls  Wait. 

Input  Data:  On  input  (input,  Pi,  Xi)  from  player  Pi  the  functionality  stores  (input,  i,Xi).  The  functionality 
now  calls  Wait. 

Output:  On  input  (output)  from  all  honest  players  the  functionality  retrieves  the  data  Xi  stored  in  (input,  i,  xf) 
for  i  £  {1, . . . ,  n}  (if  all  do  not  exist  then  the  functionality  aborts).  The  functionality  then  retrieves  /  from 
(function,  f)  and  computes  y  =  f(xi, . . . ,  Xn)  and  outputs  it  to  the  environment  (or  aborts  if  (function,  f) 
has  not  been  stored).  The  functionality  now  calls  Wait.  Only  on  a  successful  return  from  Wait  will  the 
functionality  output  y  to  all  players. 

Fig.  4:  The  required  ideal  functionality  for  PFE 


^Offline  Pi  =  Ti  — 


Vo  NLINE 

1.  Prepare  outgoing  wire 

^7r(i)  ^7r(d  ^7r(i) 

2.  Pi  computes  the  incomming  wires’ 

di  ^7v(i)  “f  Pi 

di  ^7r(i)  “f  ^Tvii)  “f  '^7r(i) 

di  =  a:7r(i)  +  G 


Fig.  5:  Transformation  of  one-time  encryption  of  an  outgoing  wire  to  the  one-time  encryption  of  an  incoming 
wire  using  the  values  computes  in  Pqffline  protocol. 


type  Gi  is  secret  and  shared  among  the  players.  This  step  can  be  performed  using  the  IPmpc  operations. 
Then,  parties  compute  a  one-time  encryption  of  Zi  using  the  corresponding  £  value  as  the  pad,  and  denote 
the  result  by  uj,  just  a  relabeling  where  j  is  the  outgoing  wire’s  label  of  the  output  wire  of  the  gate  (note 
that  j  =  n-\-  i  since  the  outgoing  wires  are  labeled  starting  with  the  n  input  wires  and  then  the  output  wire 
of  each  gate). 

Note,  that  if  Pi  tries  to  deviate  from  the  protocol  in  his  local  computation  (i.e.  when  he  connects  outgoing 
wires  to  incoming  wires)  the  generated  MACs  will  not  pass  the  jointly  performed  verifications  and  he  will 
be  caught.  In  that  case,  either  the  protocol  aborts  (in  the  case  of  dishonest  majority)  or  his  input  (i.e.  the 
function)  is  revealed  (in  the  case  of  honest  majority). 

This  leads  to  the  following  theorem,  whose  proof  is  given  in  Appendix  F. 

Theorem  I.  In  the  J-OFFcmc-hybrid  model  the  protocol  in  Figure  6  securely  implements  the  PFE  functionality 
in  Figure  f,  with  complexity  0(g). 


4  Implementing  fFomine  with  Linear  Complexity 

In  this  section  we  give  a  linear  instantiation  of  the  offline  phase  of  the  framework.  Since  our  online  phase 
has  linear  complexity,  a  linear  offline  phase  implementation  leads  to  a  linear  actively  secure  PFE.  The  main 
challenge  in  obtaining  a  linear  solution  is  to  design  a  linear  method  for  applying  the  extended  permutation 
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Protocol  Vo  NLINE 

The  protocol  is  described  in  the  ^OFFLiNE-hybrid  model. 

Input  Function:  Player  P\  given  /  selects  the  switching  network  mapping  tt  and  then  calls  {input function,  n,  f) 
on  the  functionality  JFoffline- 

Input  Data:  On  input  {input,  Pi,  Xi)  from  player  Pi  the  protocol  executes  the  {input,  i,Xi)  operation  of  the 
functionality  JFoffline- 

Output:  The  evaluation  of  the  function  proceeds  as  follows;  where  for  ease  of  exposition  we  set  x^^^u)  =  Uh  for 
all  h,  i.e.  if  a  wire  has  input  Xi  on  the  left  (as  outgoing  wire)  then  it  has  the  same  value  yh  on  the  right  (as 
incoming  wire)  where  i  =  7r(/i) 

—  Preparing  Inputs  to  the  Circuit: 

•  For  each  input  wire  *  (1  <  i  <  n)  the  players  execute  [ui]  =  [xi]  +  \ii],  where  i  is  the  outgoing  wire’s 
label  corresponding  to  that  input  wire,  and  [vi]  =  [ti]  +  {[xi]  +  [£i])  ■  [K]  using  the  JFmpc  functionality 
available  via  P offline  • 

•  Parties  then  call  {output,  Ui)  and  {output,  Vi)  to  open  [m]  and  [wi]. 

—  Evaluating  the  Circuit:  For  every  gate  1  <  *  <  g  in  the  circuit  players  execute  the  following  (here  we 
assume  that  the  gates  are  indexed  in  the  same  topological  order  Pi  chose  to  determine  tt): 

•  Pi  Prepares  the  Two  Inputs  for  Gate  i. 

*  Note  that  the  two  input  wires  for  gate  i  have  incoming  wire  labels  io  =  2i  —  1  and  ii  =  2i,  and 
the  {u,v)  value  for  their  corresponding  outgoing  wire  labels  are  already  determined,  i.e.  u-^^i.) 
and  VT,(i-)  are  already  opened  for  j  £  {0, 1}. 

*  Player  Pi  computes,  for  j  =  0, 1, 

dij  T  Pij  —  {vij  T  ^n(ij))  T  {"^ij  ^7r(ij)) 

“  Vij  T  ^*1  5 

nXi^  -t-  —  {t'K^i^)  {Vij  T  ^iT(ij))  '  If) 

+  ((^ij  i"K(ij))  T  (ri^-  ^TT{ij)))  '  If) 

=  Si  -  +  {yi-  +  n- )  •  K. 

*  Player  Pi  then  broadcasts  the  values  di-  and  mi-  to  all  players. 

•  Players  Check  Pi’s  Input  Preparation. 

*  All  players  then  use  the  Pmpc  operations  available  (via  the  interface  to  the  Poffline  functionality) 
so  as  to  store  in  the  Pmpc  functionality  the  values  [ni^]  =  [si^]  +  {yi  -  +  ri^)  •  [If].  The  value  is 
then  opened  to  all  players  by  calling  {Output,ni-). 

*  If  n-i  -  yf  mi .  then  the  players  call  Cheat(l)  on  the  Pmpc  functionality.  This  will  either  abort,  or 
return  the  input  of  Pi  (and  hence  the  function),  in  the  latter  case  the  players  can  now  proceed 
with  evaluating  the  function  using  standard  MPC  and  without  the  need  for  Pi  to  be  involved. 

•  Players  Jointly  Evaluate  Gate  i. 

*  The  players  store  the  value  [yij]  =  di^  —  [vi^]  in  the  Pmpc  functionality. 

*  The  Pmpc  functionality  is  then  executed  so  as  to  compute  the  output  of  the  gate  as 

[2i]  =  (1  -  [Gi])  •  (bio]  +  bill)  +  [Gi]  ■  bio]  •  bill- 

*  Note  that  the  outgoing  wire  label  corresponding  to  the  output  wire  of  the  ith  gate  is  j  =  n  +  i 
so  we  just  relabel  [zi]  to  [zj]. 

*  If  Gi  is  an  output  gate,  players  call  {Output,  Zi)  to  obtain  Zi,  disregard  next  steps  and  continue 
to  evaluate  next  gate. 

*  The  players  compute  via  the  MPC  functionality  [uj]  =  [zj\  +  [ij\. 

*  The  players  call  {Output,  uf)  so  as  to  obtain  Uj. 

*  The  players  then  compute  via  the  MPC  functionality 

bt]  =  [tj]  +  Uj  ■  [If]  =  [tj  +  {zj  +  b)  •  If]. 

*  The  players  call  {Output,  Vj)  so  as  to  obtain  Vj. 

Fig.  6:  The  Protocol  for  implementing  PFE 
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TT  to  values  {[ii]}  and  {[ti]}  to  produce  shared  values  {[(-^(i)]}  and  {[t7r(i)]}-  In  the  semi-honest  case  [17], 
linear  complexity  solution  for  this  problem  is  achieved  by  employing  a  singly  homomorphic  encryption.  The 
shared  values  are  jointly  encrypted;  Pi  applies  the  extended  permutation  to  the  resulting  ciphertexts  and 
re-randomizes  them  in  order  to  hide  tt;  parties  jointly  decrypt  in  order  to  obtain  the  shares  of  the  resulting 
plaintexts.  To  obtain  active  security,  we  need  to  make  each  step  of  the  following  computation  actively  secure: 

1.  Players  encrypt  the  shared  input  (all  of  which  lie  in  using  an  encryption  scheme,  with  respect  to  a 
public  key  for  which  the  players  can  execute  a  distributed  decryption  protocol.  The  resulting  ciphertexts 
are  sent  to  Pi. 

2.  Player  Pi  applies  the  EP  and  re-randomizes  the  ciphertexts  and  sends  them  back.  He  then  uses  the 
Z/Cep  protocol  to  prove  his  operation  has  been  done  correctly. 

3.  The  players  then  decrypt  the  permuted  ciphertexts  and  recover  shares  of  the  plaintexts. 

To  implement  the  first  and  last  steps  we  use  an  an  instantiation  based  on  ElGamal  encryption,  see  Ap¬ 
pendix  A.  The  middle  step  is  more  tricky,  and  we  devote  the  rest  of  this  section  to  describing  this.  For  the 
middle  step  we  need  a  linear  zero-knowledge  protocol  to  prove  that  Pi  applied  a  valid  EP  to  the  ciphertexts. 
Proof  of  a  correct  shuffle  is  a  well  studied  problem  in  the  context  of  Mix-Nets,  and  linear  solutions  for  it 
exist  [11].  As  discussed  in  Appendix  B  ,  however,  extending  these  linear  proofs  to  the  case  of  extended 
permutations  faces  some  subtle  difficulties  which  we  leave  as  an  open  question.  Instead  we  aim  for  a  more 
general  construction  that  uses  the  currently  available  proofs  of  shuffling,  in  a  black-box  way. 


4.1  Linear  Z/Cep  Protocol 

After  players  compute  the  encryption  of  the  shared  inputs.  Pi  knowing  the  circuit  topology,  applies  the 
corresponding  extended  permutation  to  the  ciphertexts.  He  then  re-randomizes  the  ciphertexts  and  then 
“opens”  the  ciphertexts.  Next,  we  give  a  linear  zero-knowledge  protocol  Z/Cep,  which  enables  Pi  to  prove 
the  correctness  of  his  operation  (i.e  final  ciphertexts  are  the  result  of  Pi  applying  a  valid  EP  to  the  input 
ciphertexts) .  As  our  first  attempt  we  considered  the  possibility  of  extending  existing  linear  proofs  of  shuffle 
to  get  linear  proofs  of  extended  permutation.  While  plausible  there  are  subtle  difficulties  that  need  to  be 
addressed.  For  more  details  regarding  our  attempt  on  extending  the  method  of  Furukawa  [11,10],  refer  to 
Appendix  B  .  We  leave  this  approach  as  an  open  problem.  Instead  we  give  a  more  general  construction  which 
makes  black-box  calls  to  proof  of  shuffle.  This  construction  is  inspired  by  the  switching  network  construction 
of  EP  given  in  [17].  We  first  revisit  the  extended  permutation  construction  of  [17]. 

Assume  the  EP  mapping  represented  by  the  function:  tt  :  {l...n}  ^  {l...m}  (Which  maps  m  input  wires 
to  n  output  wires  (n  >  m)).  Note  that  in  this  section  we  use  n  and  m  to  denote  the  size  of  EP.  In  a 
switching  network,  the  number  of  inputs  and  outputs  are  the  same,  therefore,  the  construction  takes  m  real 
inputs  of  the  EP  and  n  —  m  additional  dummy  inputs.  The  construction  is  divided  into  three  components. 
Each  component  takes  the  output  of  the  previous  one  as  input.  Instead  of  applying  the  EP  in  one  step. 
Pi  applies  each  component  separately  and  uses  a  zero-knowledge  protocol  to  prove  its  correctness.  Figure  7 
demonstrates  the  components.  Next,  we  describe  each  component  and  identify  the  required  ZK  proof. 

Table  1  lists  the  zero-knowledge  protocols  that  we  make  a  black-box  use  in  our  Z/Cep  protocol.  Note  that 
we  use  P  and  Q  for  our  EC  instantiation  instead  of  g  and  h. 

—  Dummy-value  placement  component:  This  takes  the  real  and  dummy  ciphertexts  as  input  and  for 
each  ciphertexts  of  a  real  value  that  is  mapped  to  k  different  outputs  according  to  tt,  outputs  the  real 
ciphertexts  followed  hy  k  — I  dummy  ciphertexts.  This  is  repeated  for  each  real  ciphertext.  The  resulting 
output  ciphertexts  are  all  re-randomized.  The  dummy  replacement  step  can  be  seen  as  a  shuffling  of  the 
input  ciphertexts.  We  use  a  proof  of  correct  shuffle,  Z/Cshuffle,  for  correctness  of  this  component. 

—  Replication  component:  This  takes  the  output  of  the  previous  component  as  input.  It  directly  outputs 
each  real  ciphertext  but  replaces  each  dummy  ciphertext  with  an  encryption  of  the  real  input  that 
precedes  it.  At  the  end  of  this  step,  we  have  the  necessary  copies  for  each  real  input  and  the  dummy 
inputs  are  eliminated.  Naturally,  all  the  ciphertexts  are  re-randomized.  To  prove  correctness  of  this  step, 
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Dummy  Placement  Replication  Permutation 

Phase  Phase  Phase 


Fig.  7:  EP  construction.  Components’  names  are  written  underneath.  The  zero-knowledge  protocol  for  each 
component  is  written  inside  it’s  component  box. 


ZK.  Protocol 

Relation/Language 

Ref. 

'2^A^SHUFFLE({cti},  {Ct^}) 

^Shuffle  =  {(G*,  Q,  h,  {cti},  {ctj})|37r,  st. 

^  A  TT  is  perm.} 

[11] 

'^^EQ(ctl,  Ct2) 

77eq  =  {{G,g,h,cti  =  (oi, /3i)ie{i.2})|3(mi,  m2),  st. 

Oi  =  (/’’•  A  di  =  miK'  A  mi  =  m2} 

[5] 

-2/Cno  (ct) 

jC.no  =  {{G,g,  h,  ct  =  (a,d))  3(mi  7^  l),st. 

a  =  g^  A  P  =  miK} 

[13] 

Table  1:  List  of  zero-knowledge  protocols  used  in  our  Z/Cep  protocol.  Generator  g  and  public  key  h  =  5®*. 


we  need  ZK  proofs  that  the  i-th  output  ciphertext  has  a  plaintext  equal  to  that  of  either  the  t-th  input 
ciphertext  or  {i  —  l)-th  output  ciphertext  (these  can  be  achieved  using  protocol  Z/Ceq  defined  in  Table  1 
as  a  building  block) .  But  this  is  not  sufficient  to  guarantee  a  correct  EP,  as  we  also  have  to  make  sure  that 
after  the  replication  component  there  are  no  dummy  ciphertexts  left.  For  this,  we  assume  that  all  dummy 
ciphertexts  are  encryptions  of  one.  Then  for  each  output  ciphertext  in  the  replication  component  we  use 
a  protocol  Z/Cno,  be.  a  ZK  proof  that  the  underlying  plaintext  is  not  one.  The  Z/Crep  zero-knowledge 
protocol,  is  a  compilation  of  three  ZK  protocols,  two  checking  for  equality  of  ciphertexts  and  one  checking 
the  inequality  of  plaintext  to  one. 

—  Permutation  component:  This  takes  the  output  of  the  replication  component  as  input  and  permutes 
each  element  to  its  final  location  as  prescribed  by  tt.  We  again  use  the  proof  of  correct  shuffle,  Z/Cshuffle- 
for  this  component. 


Z/Cep  Protocol  description  We  assumed  the  inputs  to  the  Z/Cep,  to  be  the  outputs  of  our  encryption  func¬ 
tionality.  Prover  applies  the  extended  permutation  to  the  ciphertexts  (cti, . . . ,  ct„),  where  cb  = 

The  prover  obtains  a  re-randomized  (ct^, . . . ,  where  ct'  =  We  employ  the  techniques  of 

Cramer  et  al.  [6],  to  combine  HVZK  proof  systems  corresponding  to  each  component,  at  no  extra  cost, 
into  HVZK  proof  systems  of  the  same  class  for  any  (monotonic)  disjunctive  and/or  conjunctive  formula 
over  statements  proved  in  the  component  proof  systems.  Figure  8  shows  the  complete  description  of  our 
Z/Cep  protocol.  Note  that  we  can  choose  dummy  values  from  any  set  of  random  values  Sd  and  substitute 
the  ZK.^o{x)  with  Vvy6Srf(Z/CEQ(a;,  y)). 

Theorem  2.  The  protocol  described  in  Figure  8  is  HVZK  proof  of  an  extended  permutation  tt,  (cti, . . . ,  ct„) 
and  (ct/, . . . ,  ct/)  in  the  Z/Cshuffle;  Z/Ceq,  Z/Cno  hybrid  model,  for  the  following  relation: 


/^EP 


{{G,g,h,{cU},{cQ)\3TT,st. 


AC7'(d 


/l’'- C2  (’"(*))  A  TT  is  EP.} 


Proof.  Following  is  a  proof  sketch.  We  show  if  the  construction  of  EP  from  [17]  is  correct,  then  the  Z/Cep 
protocol  is  a  HVZK  proof  for  EP.  The  goal  of  first  two  components  is  to  prepare  enough  copies  of  each 
element.  This  implies  that  after  the  second  component  no  dummy  elements  should  be  remained  and  no  new 
elements  are  introduced.  Z/Cshufele  and  Z/Crep  guarantee  these  two.  Z/Cshufele  makes  sure  no  additional 
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Protocol  ZJCEp{{cii},  {cti}) 

Shared  Input:  Ciphertexts  (cti, . . . ,  ct„) 

Pi’s  Input:  Extended  permutation  tt 

Pi  Evaluates  the  components. 

—  Player  Pi  finds  the  corresponding  permutation  tti,  and  712  for  Dummy-placement  component  and  per¬ 
mutation  components. 

—  Pi  applies  the  Dummy-placement  component  to  (cti, . . . ,  ctn),  and  re-randomizes  to  find  (cti^\  . . . , 

—  Pi  applies  the  Replication  component  to  (cti^\  . . . ,  and  re-randomizes  them  to  find  (cti^^ ,  •  ■  • , 

—  Pi  applies  the  permutation  component  to  (cti^\  . . . ,  and  re-randomizes  them  to  find  (cti,  •  •  • ,  ct(i). 

Pi  Computes  the  ZK  proofs  and  sends  everything 

—  Player  Pi  uses  the  ZJCs  HUFFLE  ({cti},  {ctf^})  and  ZlCs  HUFFLE  ({ct^^},  {ct'})  protocols  to  produce  proof  of 
correctness  for  his  evaluation  of  Dummy-placement  component  and  permutation  component. 

—  Player  Pi  used  the  ^AiREp({cti^^},  {ct^^})  to  produce  proof  of  correctness  for  his  evaluation  of  Replication 

component  as  follows(using  [6]  for  combination)  (and  ZIC^ep  ~  ZK-m  ZlCEQ{t^i  \  ctj^^)): 

•  For  2  <  i  <  n: 

=  (2/CEQ(ctf\  ctf>)  V  ^/Ceq(cC\,  Ctf  ^))  A  ZJC^o  (ctf 

•  -^^Rep  —  ■■,'P- (-^^Rep) 

—  Player  Pi  sends  (cti^\  . . . ,  cti'^^),  (cti^\  . . . ,  cti^^),  (ct), . . . ,  ct'^)  and  all  proofs  to  other  players. 

Players  verify  Pi  operations 

—  Players  verify  Pi’s  operations  by  verifying  the  the  proofs  sent  by  Pi. 

Fig.  8:  The  protocol  for  zero-knowledge  proof  of  extended  permutation. 


elements  are  introduced  in  the  first  component.  Z/Crep  ensures  each  element  is  one  of  the  input  pairs  to  the 
second  component.  This  makes  sure  no  new  elements  are  introduced  in  this  step.  Furthermore,  it  checks  using 
ZlCfio  for  remaining  dummy  elements.  Note  that  the  EP  construction  does  not  require  dummy-placement 
phase  to  necessarily  arrange  the  elements  in  any  order,  and  as  long  as  we  have  satisfied  the  two  mentioned 
properties,  application  of  any  permutation  component,  results  in  a  valid  EP,  and  also  a  valid  circuit  topology. 
Z/Cshuffle  is  used  to  check  the  final  component.  This  sums  up  the  proof.  Finally  we  employ  the  techniques  of 
Cramer  et  al.  [6],  to  combine  HVZK  proof  systems  corresponding  to  each  component,  at  no  extra  cost,  into 
HVZK  proof  systems  of  the  same  class.  Note  that  we  make  a  black-box  call  to  underlying  ZK  proof  systems. 

Offline  Protocol  Having  all  the  parts  of  the  puzzle,  we  can  give  the  complete  0{g)  protocol  for  the  offline 
phase.  Figure  9  shows  the  description,  with  the  proof  of  security  given  in  Appendix  C. 

5  A  practical  Implementation  of  fFomine  with  0{g  •  loggi)  Complexity 

A  0{g  ■  log  (/)  protocol  to  implement  IFoffline  is  given  in  Figure  13  and  Figure  14  (see  Appendix  D),  and  is 
in  the  J^MPC-hybrid  model.  Following  the  ideas  in  [17],  we  implement  the  functionality  via  secure  evaluation 
of  a  switching  network  corresponding  to  the  mapping  tt/. 

Switching  Networks.  A  switching  network  SN  is  a  set  of  interconnected  switches  that  takes  N  inputs  and  a 
set  of  selection  bits,  and  outputs  N  values.  Each  switch  in  the  network  accepts  two  Abit  strings  as  input  and 
outputs  two  £-bit  strings.  In  this  paper  we  need  to  use  a  switching  network  that  contains  two  switch  types. 
In  the  first  type  {type  1),  if  the  selection  bit  is  0  the  two  inputs  remain  intact  and  are  directly  fed  to  the  two 
outputs,  but  if  the  selection  bit  is  1,  the  two  input  values  swap  places.  In  the  second  type  {type  2),  if  the 
selection  bit  is  0,  as  before,  the  inputs  are  directly  fed  to  outputs  but  if  it  is  1,  the  value  of  the  first  input  is 
used  for  both  outputs.  For  ease  of  exposition,  in  our  protocol  description  we  assume  that  all  switches  are  of 
type  1,  but  the  protocol  can  be  easily  extended  to  work  with  both  switch  types. 
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Linear  Implementation  of  Protocol  "PoFFUNE-Linear 

The  protocol  is  described  in  the  ^MPC-hybrid  model,  thus  the  only  operation  we  need  to  specify  is  the  Input 
Function  one. 

Input  Function: 

Pi  Shares  his  Circuit/Function. 

—  Player  Pi  calls  {input,  Gj)  for  all  j 

—  Players  evaluate  and  open  [Gj]  ■  (1  —  [f?j])  for  j  £  {1, . . .  ,g}.  If  any  of  them  is  not  0,  players  abort  (since 
in  this  case  Pi  has  not  entered  a  valid  function). 

Players  Generate  Randomness  for  inputs  and  outputs  of  EP. 

—  Players  call  {random,  •)  of  Pmpc  to  generate  shared  random  values  for  inputs  f  =  ([fi], . . . ,  [fow(/)])  and 
outputs  ([ri], . . . ,  [ri„(/)])  of  EP. 

—  Players  call  {random,  •)  of  Pmpc  to  generate  shared  random  values  for  the  MAC  value  corresponding  to 
inputs  t  =  ([ti], . . . ,  [tow(/)])  and  outputs  ([si], . . . ,  [siw(/)])  of  EP. 

Pi  applies  the  EP  to  £  and  t. 

—  The  players  call  KeyGen  on  the  Enc^ig  functionality. 

—  The  playes  call  Encrypt  on  the  Enc^jg  functionality  with  the  plaintexts  {[ii],  ■  ■  ■ ,  [fow(/)])  and  the  plain¬ 
texts  ([ti], . . . ,  [tow(/)]),  to  obtain  ciphertexts  cti, . . . ,  ctow(/)  and  ctj, . . . , 

—  Player  Pi  applies  the  extended  permutation  to  (cti, . . . ,  ctow(/))  and  re-randomize  to  get  (ct), . . . , 
the  same  is  done  with  (ct|, . . . ,  to  obtain  (ctf , . . . , 

—  Player  Pi  uses  the  ZJCep  to  prove  that  he  has  used  a  valid  extended  permutation. 

—  Players  call  the  Decrypt  on  the  Encgjg  functionality  (Figure  11)  with  ciphertexts  (cti, . . . ,  ct(,„(j-))  and 

(ctf , . . .  ,ct'|„(^j)  so  as  to  obtain  ([4(i)],  •  •  • ,  [4(ow(/))])  and  ([G(i)],... ,  [G(o„(/))]). 

Players  Compute  Pi,qi. 

—  For  i  £  {1, . . . ,  iw(/)}  players  call  Pmpc  to  compute: 

[Pi]  —  [ri]  [^7r(i)]  ■^7r(i)]  ?  [Pi]  [^*]  [t7r{i)]  T  Pi  '  [A^]  iin(i)  Pi  ' 

Fig.  9:  The  protocol  for  linear  implementation  of  the  Offline  Phase 
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The  mapping  tt  :  {1 . . .  N}  — s-  {1 ...  TV}  corresponding  to  a  switching  network  SN  is  defined  such  that 
7r(j)  =  i  if  and  only  if  after  evaluation  of  SN  on  the  N  inputs,  the  value  of  the  input  wire  i  is  assigned  to 
the  output  wire  j  (assuming  a  standard  numbering  of  the  input /output  wires).  In  [17]  it  is  shown  how  to 
represent  any  mapping  with  a  maximum  of  N  inputs  and  outputs  via  a  network  with  0{N  ■  logN)  type  1 
and  2  switches  (We  refer  the  reader  to  [17]  for  the  details).  This  yields  a  switching  network  with  0{g  ■  logg) 
switches  to  represent  the  mapping  for  a  circuit  with  g  gates. 

High  Level  Description.  It  is  possible  to  implement  the  IT'offline  by  securely  computing  a  circuit  for  the  above 
switching  network  using  the  .Fmpc-  But  for  all  existing  MFC  that  meet  our  requirements,  this  would  require 
0(log(7)  rounds  of  interaction  which  is  the  depth  of  the  circuit  corresponding  to  the  switching  network. 
We  show  an  alternative  constant-round  approach  with  similar  computation  and  communication  efficiency. 
It  follows  the  same  idea  as  the  OT-based  protocol  of  [17]  where  the  OT  is  replaced  with  an  equivalent 
functionality  implemented  using  The  main  challenge  in  our  case  is  to  achieve  active  security  and  in 

particular  to  ensure  that  Pi  cannot  cheat  in  his  local  computation.  We  do  so  by  checking  Pi’s  actions  using 
one-time  MACs  of  the  values  he  computes  on,  and  allow  the  other  parties  to  learn  his  input  and  proceed 
without  him,  if  he  is  caught  cheating  (or  aborting) . 

Next  we  give  an  overview  of  the  protocol.  The  protocol  has  four  main  components  (as  described  in 
Figure  13  and  Figure  14).  In  the  first  step.  Pi  converts  his  mapping  tt  to  selection  bits  for  the  switching 
network  (i.e.  bis)  and  shares  them  with  all  players.  He  also  shares  a  bit  Gi  indicating  the  function  of  gate 
i,  with  other  players.  In  the  second  step,  players  generate  random  values  for  every  wire  in  the  network.  Pi, 
based  on  his  selection  bit  for  the  switch,  learns  two  of  the  four  possible  “subtractions”  of  the  random  values 
for  two  output  wires  from  those  of  the  input  wires  i.e.  Ug*  and  .  A  similar  process  is  performed  for  the  t 
values  to  obtain  Ug*  and  m}’*  (Figure  10  shows  this  process  in  a  diagram).  These  subtractions  enable  Pi  to 
transform  a  pair  of  values  blinded  with  the  random  values  of  input  wires,  to  the  same  pair  of  values  permuted 
(based  on  the  selection  bit)  and  blinded  with  the  random  values  of  the  output  wires.  All  of  the  above  can 
be  implemented  using  the  operations  provided  by  the  Pmpc- 


hi  =  0 
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=  oufy.o  - 
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'n/i 
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Fig.  10:  The  i-th  switch,  (superscripts:  label  of  value  subject  to  permute  {£  or  t),  and  switch  index  i)  (sub¬ 
scripts:  d  refers  to  data,  m  refers  to  MAC,  wire  index  0  denotes  the  top  wire  in  switch  and  1  the  bottom 
wire  in  switch) 

In  the  third  step.  Pi  obtains  the  blinded  i  and  t  values  where  the  blinding  for  each  is  the  random  value 
for  the  corresponding  input  wire  to  the  network  (these  are  etc).  Party  Pi  can  now  process  each 

switch  as  discussed  above  using  the  subtraction  values  in  order  to  evaluate  the  entire  network.  At  the  end 
of  this  process.  Pi  holds  blinded  values  of  the  outputs  of  the  switching  network  (blinded  with  randomness 
of  the  output  wires). 

In  the  final  step,  parties  check  that  Pi  has  not  cheated  during  his  evaluation,  since  he  performed  this 
step  locally  and  not  through  the  Pmpc  operations.  We  use  one-time  MACs  to  achieve  this  goal.  In  particu¬ 
lar,  besides  mapping  blinded  values  through  the  network.  Pi  also  maps  the  corresponding  one-time  MACs 
(generated  using  the  fixed-key  K).  This  is  done  using  a  similar  process  described  above  and  via  the 
values.  At  the  end  of  this  process.  Pi  holds  one-time  MACs  for  the  blinded  outputs  of  the  switching  network, 
in  addition  to  the  values  themselves.  Players  then  use  the  MPC  functionality  to  jointly  verify  that  the  MACs 
indeed  verify  the  values  Pi  shared  with  them  (i.e.  fy’*  and  are  the  same,  etc).  As  a  result.  Pi  can  only 
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cheat  by  forging  the  MACs  which  only  happens  with  a  negligible  probability.  If  the  MACs  pass,  parties 
compute  and  open  the  “difference  vectors”  by  subtracting  the  mapped  I  and  t-value  vectors  from  the  r  and 
s- value  vectors.  Refer  to  Figure  13  and  Figure  14  for  more  details.  If  one  instantiates  the  IFmpc  by  SPDZ  [8], 
which  has  the  m.  log(p^)  complexity,  then  our  complexity  would  be  m  (10(25  log  2^  —  2^  +  1)  +  Ag) .  log(p^) . 
Refer  to  Appendix  Efor  the  proof  of  the  following  theorem. 

Theorem  3.  In  the  J^upc-hybrid  model  the  protocol  T^offline  in  Figure  13  and  Figure  14  securely  implements 
the  functionality  in  Figure  2,  with  complexity  0{g  ■  log  5). 
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A  Instantiating  Shared  Encryption/Decryption 

Recall  our  messages  are  elements  in  F^ic  and  we  aim  to  work  in  an  elliptic  curve  group  of  prime  order  (to  ensure 
DDH  holds  in  the  whole  group) .  We  therefore  consider  the  finite  field  Fp2k  =  Fpk  \6] ,  and  consider  an  elliptic 
curve  E{Fp2k)  of  prime  order  q  with  generator  P.  Let  the  curve  be  given  by  the  equation  =  X^-\-A-X-\-B 
where  A,  B  €  Fp2k .  To  encrypt  an  element  m  G  Fpk  we  map  elements  of  F^ic  to  elliptic  curve  points  as  follows: 
We  pick  a  random  r  G  F^t  and  set  x  =  m -\-  r  ■  9.  li  t  =  -\-  A  ■  x  -\-  B  \s  &  square  (which  can  be  tested 

by  checking  if  -i)/2  =  i)^  we  extract  the  square  root  y  (by  the  Tonelli-Shanks  algorithm)  and  return 
M  =  {x,y),  otherwise  we  pick  another  r  and  repeat  the  operation.  We  expect  this  process  to  terminate  after 
two  steps  on  average. 

Given  M  we  can  encrypt  it  by  selecting  k  G  and  computing  (Ci,  (72)  =  {k-P,M-\-k-Q)  where  Q  =  st  P 
is  the  public  key  corresponding  to  the  secret  key  s6.  The  decryption  can  be  obtained  via  C2  —  st  ■  Ci,  and 
then  simply  taking  the  x-coordinate  as  a;o  +  a:i  •  0  and  returning  xg- 

We  need  to  perform  the  encryption  however  on  values  which  are  shared  via  the  Tqqpc  functionality,  and 
decrypt  to  obtain  values  which  are  shared  via  the  Pmpc  functionality.  We  first  note  that  since  the  T^mpc 
functionality  can  evaluate  arithmetic  circuits  over  F^k  it  can  also  evaluate  circuits  over  Fp2k;  so  for  ease  of 
exposition  we  will  assume  that  IFmpc  is  defined  over  Fp2k.  We  can  therefore  define  the  functionality  Encp;;^ 
given  in  Figure  11  in  the  iF^pg-hybrid  model.  To  ease  notation  we  let  [P]  denote  a  sharing  of  an  elliptic 
curve  point  P  in  the  Pmpc  functionality  in  what  follows.  To  save  space  we  have  included  the  protocol  to 
implement  Pmpc  within  the  description  of  the  functionality  itself. 
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Functionality  Encsig 

KeyGen:  This  generates  the  public  key  for  the  ElGamal  encryption,  given  a  shared  secret  key.  The  secret  key 
is  stored  as  shared  bits  for  convenience,  i.e.  [st]  =  '  2*. 

1.  Player  i  calls  {input,  Pi,  for  j  —  0, . . .  ,  logj  g  and  randomly  selected  Xij  €  {0,1}  chosen  by 

player  i. 

2.  Define  [Q]  as  the  sharing  of  the  point  at  infinity. 

3.  This  step  forms  [sfi]  =  and  [Q]  =  ■  2'  ■  P,  and  ensures  that  the  players  input  values  in 

the  first  step  are  in  {0, 1}.  We  perform  this  step  by  executing,  for  i  =  0, . . .  ,  logj  q, 

—  [sE]  =  [s6o.i] 

—  For  j  =  2, . . . ,  n  do  [sU]  =  —2  •  [sti]  ■  [stj,i]  +  [s6i]  +  [sij,i],  using  the  MPC  functionality. 

—  Compute  [ti]  =  [sE]  •  ([s^i]  —  1),  again  using  the  MPC  functionality. 

—  Call  {output,  ti)  to  open  [ti],  if  the  value  is  not  zero  then  restart. 

—  Execute  [Q]  =  \Q]  +  [sE]  •  2®  •  P.  Here  we  use  the  Pmpc  functionality  to  evaluate  the  conditional 

elliptic  curve  addition. 

4.  The  players  call  {output,  Q)  to  open  \Q]. 

Encrypt:  This  takes  an  input  message  [m]  where  m  £  Fp*  and  outputs  an  ElCamal  ciphertext  {Ci,C2)- 

1.  Using  a  method  similar  to  that  for  KeyGen  above  the  players  generate  sharings  of  bits  \ki]  for  i  = 
0, . . . ,  logj  q  and  then  evaluate  [kP]  and  \kQ]  for  k  the  integer  with  bit  representation  given  by  the 
shared  bits  \ki\. 

2.  The  players  call  {random,  r). 

3.  The  players  execute  [x]  =  [m]  +  8  ■  [r]. 

4.  The  players  execute  [t]  =  -\-  A  -  [x\  +  B. 

5.  The  players  compute  [s]  =  \t^^  -i)/2j  {output,  s)  to  open  [s]. 

6.  If  s  7^  1  then  goto  step  2. 

7.  The  players  execute  the  Tonelli-Shanks  algorithm  to  extract  the  square  root  [y\  of  [t]  using  the  Pmpc 
functionality. 

8.  The  players  execute  [G]  =  ([x],  [y])  +  [kQ], 

9.  The  players  call  {output,  •)  on  the  x  and  y  coordinates  of  [kP]  and  [G]  so  as  to  obtain  Gi  and  G2. 
Decrypt:  Obtain  the  sharing  of  the  message  [m]  corresponding  to  ciphertext  (Gi,G2). 

1.  The  players  execute  using  Pmpc  the  operations  corresponding  to  [G]  =  G2  —  •  2*  •  Gi. 

2.  Consider  [G]  as  having  x-coordinate  [m]  +  9  ■  \m']  and  output  [m]. 


Fig.  11:  Elgamal  Functionality 
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B  An  Incomplete  Attempt  to  Extend  Existing  Proofs  of  Shnffle 


In  this  section  we  explain  our  attempt  at  extending  the  existing  proofs  of  shuffle  to  extended  permutation. 
Current  available  solutions  are  following  two  main  ideas:  The  first  group  started  by  Furukawa  and  Sako  [11] 
represents  the  permutation  by  a  permutation  matrix  and  then  proves  using  ZK  that  it  is  a  valid  permutation 
and  is  used  in  computation.  The  second  group  started  by  Neff  [18]  uses  the  property  of  polynomials  of  being 
identical  under  permutation  of  their  roots. 

In  the  second  group,  it  is  not  obvious  how  it  is  possible  to  handle  variant  number  of  repetitions  for  each 
root.  On  the  other  hand  it  is  possible  to  represent  an  EP  using  a  matrix. 

We  turn  to  modifying  the  method  of  Furukawa  and  Sako  [11]  (and  the  later  work  by  Furukawa  [10]),  to 
check  an  extended  permutation.  We  only  describe  the  general  idea,  for  more  details  concerning  our  modifi¬ 
cations  we  refer  the  reader  to  the  original  paper  [11].  In  their  protocol  they  use  the  matrix  representation  of 
permutation  and  prove  that  the  matrix  used  for  computation  of  outputs  is  a  valid  permutation  (i.e.  there  is 
exactly  one  non-zero  element  one  in  each  row  and  each  column) .  For  our  purpose  of  extended  permutation,  it 
is  only  enough  to  show  that  there  is  exactly  one  non-zero  element,  one  in  each  column  of  matrix.  Theorem  4 
shows  the  conditions  for  a  matrix  to  be  an  extended  permutation. 

Theorem  4.  A  matrix  is  an  extended  permutation  if  and  only  if,  for  all  i,j  and  k,  the 

following  conditions  hold: 

n 

'^Ahi  =  l  (mod  g)  (1) 

h=l 

For  all  i,j  ■■  {i  ^  j) 

n 

Ah  •  Ajh  =  0  (mod  q)  (2) 

h^l 

For  all  i,j,  k  :  —i{i  =  j  =  k) 

n 

^  A,h  ■  Ajh  ■  Akh  =  0  (mod  q)  (3) 

h=l 

Proof  (sketch).  The  first  condition  implies  that  there  is  at  least  one  non-zero  element  in  each  column.  Using 
the  similar  argument  to  [11],  for  i  ^  j,  the  second  and  third  conditions  imply  that  the  number  of  non-zero 
elements  in  each  column  is  at  most  one.  From  first  condition,  this  non-zero  element  should  be  one. 


This  theorem  allows  us  to  adapt  the  zero-knowledge  protocol  given  in  [11].  The  main  challenge  in  their 
protocol  is  to  give  proof  for  the  conditions  of  equations  2,3.  We  assume  that  the  prover  has  applied  the 
extended  permutation  to  the  ciphertexts  (cti, . . . ,  ct„),  where  cb  =  (C^^^C;^*^).  The  prover  obtains  a  re¬ 


randomized  (ct'i, . . . ,  ct(j),  where  ct'  =  C'2*'*^)  and  =  k[ 


V(i) 


P  +  =  k[-Q  +  C^ 


7i-(i)) 


To  prove  the  condition  in  equation  2,  we  have  to  show  that  given  and  the  prover  knows 


k'i  and  An  such  that: 


=  k'-P  + 


Ea 

i=o 


a 


U) 


and 


h=l 


Aih  *  Ajh  —  0. 


In  [11]  they  suggest  to  issue  values  s  and  Si  as  a  respond  to  challenge  cj  and  let  the  verifier  check  two 
conditions.  We  adjust  Si  for  our  modified  scenario  such  that  s(  generates  the  condition  of  equation  2: 


Si  =  '^Aj,Cj  (mod  g), 
j=i 


At  this  point  it  is  not  obvious  how  to  issue  s,  and  define  the  second  verification  equation  considering  the 
modified  Si. 
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C  Proof  of  Protocol  T^ofHme-Linear 


We  construct  a  simulator  5offline  such  that  a  poly-time  environment  Z  cannot  distinguish  between  the 
real  protocol  system  and  the  ideal.  We  assume  here  static,  active  corruption.  The  simulator  runs  a  copy  of 
the  protocol  given  in  Figure  9,  which  simulates  the  ideal  functionality  given  in  Figure  2.  It  relays  messages 
between  parties/J^MPC  ^^nd  Z,  such  that  Z  will  see  the  same  interface  as  when  interacting  with  a  real  protocol. 
The  specification  of  the  simulator  iSqffline  is  presented  in  Figure  12. 

Simulator  5offline 

The  protocol  is  described  in  the  J^MPc-hybrid  model,  thus  we  only  need  to  specify  the  simulator  for  the  Input 
Function  one.  Let’s  denote  the  set  of  corrupted  parties  by  C  C  {Pi  ,  ...,Piv}. 

Input  Function: 

Pi  Shares  his  Circuit/Function. 

-  Pi  e  C: 

•  Simulator  iSoffline  evaluates  [Gj]  ■  (1  —  [Gj])  for  j  £  {1, . . .  ,g}.  If  any  of  them  is  not  0,  simulator 
abort  (since  in  this  case  Pi  has  not  entered  a  valid  function). 

-  Pi  ^  C: 

•  Simulator  <Soffline  generates  a  random  circuit  with  g  gates  Gj  for  all  j  £  and  finds  its 

corresponding  mapping  tt'  . 

•  So  FFLINE  calls  {input,  G'j)  for  all  j  £  {1, ... ,  g}. 

•  Simulator  5ofeline  evaluates  [G']  •  (1  —  [G'j])  for  j  £  {1,. . .  ,g}. 

Players  Generate  Randomness  for  inputs  and  outputs  of  EP. 

—  Simulator  5oefline  follows  the  protocol  honestly. 

Pi  Applies  the  EP  to  sf  and  st. 

-  Pi  £  G: 

•  Simulator  <Soffline  randomly  generates  an  extended  permutation  tt'  and  sends  it  to  ZK.^p  ideal 
functionality.  Simulator  aborts  if  any  of  players  aborts. 

-  Pi  ^  G: 

•  Simulator  follows  the  protocol  honestly  and  sends  st  and  st  to  EncEig  ideal  functionality. 

•  Simulator  iSoffline  waits  for  Pi  to  broadcast  tt  to  ZK-ep  ideal  functionality,  he  then  sends  tt  to  ideal 
functionality  Poffline.  Simulator  aborts  if  any  of  players  aborts. 

•  Simulator  follows  the  protocol  honestly. 

Players  Check  Pi’s  Work  and  Compute  Pi,qi. 

—  Simulator  <Soffline  follows  the  protocol  honestly. 

Fig.  12:  Simulator  ^offline 


To  see  that  the  simulated  and  real  processes  cannot  be  distinguished,  we  will  show  that  the  view  of  the 
environment  in  the  ideal  process  is  statistically  indistinguishable  from  the  view  in  the  real  process.  This  view 
consists  of  the  corrupt  players’  view  of  the  protocol  execution  as  well  as  the  inputs  and  outputs  of  honest 
players. 

The  view  of  adversaries  G  —  {Pi},  includes  the  share  of  Gi,  the  share  of  random  values  for  inputs  and  out¬ 
puts  of  EP,  ([sf  i] ,  .  .  .  ,  [sf’owl/}])  J  7  •  ■  •  ;  [^^iw(/)])  J  ([^P] ;  •  ■  •  ;  [^^ow(/)])  j  •  ■  •  ?  [^'^iw(/)])j  ( ['^'^7r(l)]  5  ■  •  ■  5  ['^'^7r(ow(/))])  j 

([5^71(1)])  [stTrlowl/))]))  (scti , . . . ,  sctow(/)))  (scfi ,  • .  • ,  ) ,  (sctj^ , . . . ,  ,  (sctj^  , . . . ,  ,  and 

finally,  Pi,qi.  The  shared  values  all  look  random  and  therefore  are  indistinguishable  between  ideal  and  real 
execution,  (scti, . . . ,  sctow(/))  and  (sct|, . . . ,  are  ElGamal  encryptions  under  shared  secret  key,  and 

therefore  are  indistinguishable  from  real  execution,  (set}, . . . ,  and  . . . ,  are  valid  re¬ 

randomization  of  ElGamal  ciphertexts  if  protocol  does  not  abort  due  to  Z/Cep  verification.  ([s^7r(i)];  •  ■  •  ?  [■s^7r(ow(/))])> 
([st,r(i)])  •  ■  •  J  ['St7r(ow(/))])  are  freshly  new  shares  generated  by  EncEig  protocol.  The  final  result  Pi,qi  is  com¬ 
puted  as  a  result  of  two  shared  random  values,  and  therefore  has  a  uniform  distribution  in  both  ideal  and 
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real  executions.  The  view  of  malicious  Pi,  is  the  same  view  as  other  malicious  players.  The  shared  values 
all  have  uniform  distribution.  In  the  ideal  functionality  we  also  have  a  uniform  distribution,  and  as  a  result 
ideal  and  real  executions  are  indistinguishable  to  the  environment  Z. 


D  Complete  Description  of  Protocol  T^offlme 

See  Figure  13  and  Figure  14  for  the  description  of  protocol  T^offline- 

E  Proof  of  Theorem  3 

We  construct  a  simulator  5offline  such  that  a  poly-time  environment  Z  cannot  distinguish  between  the  real 
protocol  system  and  the  ideal.  We  assume  here  static,  active  corruption.  The  simulator  runs  a  copy  of  the 
protocol  given  in  Figure  13  and  Figure  14,  which  simulates  the  ideal  functionality  given  in  Figure  2.  It  relays 
messages  between  parties/J^MPC  Eind  Z,  such  that  Z  will  see  the  same  interface  as  when  interacting  with  a 
real  protocol.  The  specification  of  the  simulator  iSqffline  is  presented  in  Figure  15. 

To  see  that  the  simulated  and  real  processes  cannot  be  distinguished,  we  will  show  that  the  view  of  the 
environment  in  the  ideal  process  is  statistically  indistinguishable  from  the  view  in  the  real  process.  This  view 
consists  of  the  corrupt  players’  view  of  the  protocol  execution  as  well  as  the  inputs  and  outputs  of  honest 
players. 

The  view  of  adversaries  C  —  {Pi},  includes  the  share  of  bi,  Gi,  the  share  of  wires’  random  values, 

[d*’*]  and  finally,  and  Pi,qi.  The  shared  values  all  look  random  and  therefore  are  indistinguishable 
between  ideal  and  real  execution.  The  final  result  pi,  qi  is  computed  as  a  result  of  two  shared  random  values, 
and  therefore  has  a  uniform  distribution  in  both  ideal  and  real  execution.  The  values  are  blinded  by 

shared  values  ^  and  t  respectively  and  have  uniform  distribution. 

The  view  of  malicious  Pi,  includes  the  share  of  bi,Gi,  share  of  wires’  random  values,  d^’*,d^’*,  d^’*,d*’* 
and  finally,  and  Pi,qi.  Pi  has  the  same  view  as  other  malicious  players  except  for  the  values 

that  he  has  computed  .  It  only  remains  to  show  that  have  uniform  distribution  for  a  malicious  Pi 

and  checks  are  guaranteeing  the  correctness  of  his  computation.  Observe  that  d^’*  is  blinded  using  random 
value  of  input  wires  which  is  shared  and  therefore  acts  as  a  one-time  pad,  and  as  Pi  does  the  evaluation  the 
distribution  remains  uniform  as  he  continues.  Using  the  similar  argument,  d*’*  has  a  uniform  distribution 
due  to  d^’*.  In  the  ideal  functionality  we  also  have  a  uniform  distribution,  and  as  a  result  ideal  and  real  are 
indistinguishable  to  the  environment  Z.  In  the  final  phase  players  check  the  Pi’s  computation.  Player  Pi 
cheating  means  he  has  not  calculated  d*’*  correctly.  For  him  to  be  successful,  he  has  to  somehow  adjust 
and  to  be  equal.  Any  modification  is  prevented  by  the  fact  that  since  he  does  not  know  the  key  K, 
it  acts  as  a  one-time  MAC  and  therefore  he  can  not  adjust  his  share  to  make  the  equality  hold. 

The  probability  of  him  getting  away  with  it  is  equal  to  him  guessing  K  and  hence  exponentially  small  in  the 
length  of  K.  It  follows  that  with  overwhelming  probability  after  the  check  Pi’s  computation  has  been  done 
correctly.  If  any  check  fails  the  simulator  aborts  and  stop. 


F  Proof  of  Protocol  Ponline 

We  construct  a  simulator  iSqnline  such  that  a  poly-time  environment  Z  cannot  distinguish  between  the  real 
protocol  system  and  the  ideal.  We  assume  here  static,  active  corruption.  The  simulator  runs  a  copy  of  the 
protocol  Pqnline  given  in  Figure  6,  which  simulates  the  ideal  functionalities  given  in  Figure  4.  It  relays 
messages  between  parties/PoFFUNE  and  Z,  such  that  Z  will  see  the  same  interface  as  when  interacting  with 
a  real  protocol.  The  specification  of  the  simulator  5online  is  presented  in  Figure  16. 
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Protocol  "Poffline  Part  I 

The  protocol  is  described  in  the  ^MPC-hybrid  model,  thus  the  only  operation  we  need  to  specify  is  the  Input 
Function  one. 

Input  Function: 

Pi  Shares  his  Circuit/Function. 

—  Pi  determines  a  vector  of  selection  bits  (fei, . . . ,  fcjv)  corresponding  to  the  switching  network  representing 
the  mapping  tt.  Note  that  the  switching  network  has  ow(/)  input  wires  and  iw(/)  output  wires. 

—  Player  Pi  calls  {input,  bi)  for  all  i  €  {1, . . . ,  A^}. 

—  Player  Pi  calls  {input,  Gj)  for  all  j 

—  Players  evaluate  and  open  [bi]  ■  (1  —  [fe;])  for  all  i  £  A^}  and  similarly  for  \Gj\  ■  (1  —  [Gj])  for 

j  £  If  any  of  them  is  not  0,  players  abort  (since  in  this  case  Pi  has  not  entered  a  valid 

function). 

Players  Generate  Randomness  for  the  Switching  Network. 

—  The  players  call  {random,  K)  of  Pmpc. 

—  Players  call  {random,  •)  of  Pmpc  to  generate  two  pairs  of  shared  random  values  for  each  wire  in  the 
switching  network;  one  pair  is  used  to  map  the  £  values  and  another  to  map  the  t  values  (recall  each 
value  j  £  {1, . . . ,  ow(/)}  has  a  value  £j  and  tj). 

Let  us  denote  the  two  shared  random  pairs  for  the  jith  input  wire  {j  £  {0, 1})  of  the  ith  switch  by 
[in^V])  and  ([in^’^j,  [in^V]),  and  the  pairs  for  its  two  output  wires  by  ([out^’^.],  [out^V])  aud 


.t.i 


([out^^^],  [out^^^-]).  (The  d  subscript  means  the  random  value  is  used  to  process  data  (actual  wire  values) 
while  the  m  subscripts  means  the  random  value  is  used  for  the  corresponding  macs.  The  subscript 
j  £  {0, 1}  determines  which  wire  of  the  switch  the  value  corresponds  to.  0  means  the  the  top  wire  while 
1  denotes  the  bottom  wire.) 

—  Then,  for  each  switch  i  in  the  network  players  perform  the  following  (in  parallel): 

•  The  players  call  Pmpc  to  evaluate  and  open  the  following  for  j  £  {0, 1}  (the  following  corresponds 
to  switch  type  1  but  a  similar  approach  works  for  type  2  switches) 


r  £,i-\ 

Yj  ]  ^ 

=  (i 

-  N) 

•  ([out^y 

-  [in^-])  +  [bi]  ■ 

([outy_^l  -  [in^y). 

=  (i 

-  M 

•  ([out*y 

-  [in^])  +  [bi]  ■ 

([ouy;_^]  -  [iny.]). 

bf  ]  ^ 

=  (i 

-  [&.]) 

•  ([outy,] 

-  [in^,])  +  [h] 

•  ([o<i-.]  -  [iny, 

1  ^1^ 

+  w/  • 

[K], 

=  (1 

-  [&.]) 

■  ([outy,] 

-  [iny,])  +  [h] 

•  ([o<i-.]  -  [iny, 

1  tA 
+  «/  • 

[K]. 

Note,  the  final  two  equations  can  be  evaluated  using  the  open  values  of  and  m*’*. 
—  For  i  £  {!,...,  ow(/)}  players  call  J-mpc  to  evaluate  and  open  (let  j  =  i  mod  2) 

[h^il  =  [^.1  +  ,  [hill  =  [in^f  1]  +  hf  ■  [K], 

[hY]  =  [U]  +  ,  [hY]  =  +  hY  ■  [K], 


Fig.  13:  The  protocol  to  implement  the  Offline  Phase:  Part  I 
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Protocol  "Poffline  Part  II 

Pi  Maps  the  I  and  t  Values  Using  the  Above  Randomness. 

—  For  i  e  {1, . . . ,  iw(/)},  Pi  determines  the  sequence  of  switches  involved  in  mapping  the  input  label 
7r(i)  e  ow(/)  to  the  output  label  i  £  iw(/).  Denote  the  sequence  of  switches  by  {ii, . . .  ,ik),  and  the 
index  of  the  input  wire  the  values  goes  through  by  ji, . . .  ,jk-  Note  that  k  =  0{\ogN),  ik  =  \i/‘2\ ,  and 
jk  =  i  mod  2. 

—  Pi  then  computes  the  following  d,  m  values  and  calls  {input,  •)  of  the  Pmpc  on  each  to  store  the  value 
in  the  functionality  (i.e.  share  among  the  parties) 


7  £,7r(i 

hi 

=  4(i)  +  out^’;/^. 

.  r,7r(i 

,  t,7r{i 

=  0(i)  +  out*’;/^. 

.  t,7r{i 

'hm 

=  out:;\+d‘-*.A, 

Players  Check  Pi’s  Work  and  Compute  Pi,qi. 

—  For  i  £  {1, . . . ,  iw(/)}  players  call  Pmpc  to  compute  (let  j  =  i  mod  2) 


[n ’1  =  [outtJp]  +  •  [K],  [n*’1  =  [00^,7^']  +  [d*-']  •  [K]. 

—  Parties  then  compute  and  open  [n^’'  —  m^’*]  and  [n*’*  —  m*’*].  If  either  is  not  0,  players  call  Cheat(l)  on 
the  Pmpc  functionality.  This  will  either  abort,  or  return  the  input  of  Pi  (and  hence  the  function),  in  the 
latter  case  the  players  can  now  proceed  with  evaluating  the  function  using  standard  MPC  and  without 
the  need  for  Pi  to  be  involved.  If  the  opened  value  is  zero  the  players  compute  and  open 

[Pi]  =  [n]  -  [d'^’1  +  [out^;^‘/^^]  =  [n  - 

hi]  =  [Si]  -  +Pi  ■  [K]  =  [Si  -  +Pi-  K\, 


Fig.  14:  The  protocol  to  implement  the  Offline  Phase:  Part  II 
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Simulator  5offline 

The  protocol  is  described  in  the  jFMPc-hybrid  model,  thus  we  only  need  to  specify  the  simulator  for  the  Input 
Function  one.  Let’s  denote  the  set  of  corrupted  parties  by  (7  C  {Pi  , . . . ,  Pjv}. 

Input  Function: 

Pi  Shares  his  Circuit/Function. 

-  Pi  e  C: 

•  Simulator  <Soffline  runs  the  protocol  honestly  and  then  waits  for  Pi  to  broadcast  bi  for  all  i  € 
{1, . . . ,  A'^}  and  Gj  for  all  j  £  {1, . . .  ,g},  he  then  sends  them  to  ideal  functionality  Poffline- 

•  Simulator  5offline  evaluates  6i  •  (f  —  bi)  for  all  i  £  {1, . . . ,  A'^}  and  similarly  for  [Gj]  ■  (1  —  [Gj])  for 
j  £  {1, . . . ,  If  any  of  them  is  not  0,  simulator  abort  (since  in  this  case  Pi  has  not  entered  a  valid 
function). 

-  Pi  ^  G: 

•  Simulator  iSoffline  generates  a  random  circuit  with  g  gates  Gj  for  all  j  £  and  finds  its 

corresponding  mapping  tt' .  Then  it  determines  a  vector  of  selection  bits  (b), ...  ,b'j^)  corresponding 
to  the  switching  network  representing  the  mapping  tt'  . 

•  ^Offline  Calls  {input,  b'i)  for  alH  £  {1, . . . ,  A'"}. 

•  5o  FFLINE  calls  {input,  G'j)  for  all  j  £  {1, ... ,  g}. 

•  Simulator  <Soffline  evaluates  [6)]  •  (1  —  [6(])  for  alH  £  A'^}  and  similarly  for  [G^]  •  (1  —  [Gj])  for 

j  e 

Players  Generate  Randomness  for  the  Switching  Network. 

—  Simulator  <Soffline  follows  the  protocol  honestly. 

Pi  Maps  the  i  and  t  Values  Using  the  Randomness. 

—  Simulator  <Soffline  follows  the  protocol  honestly. 

Players  Check  Pi’s  Work  and  Compute  Pi,qi. 

—  Players  follow  the  steps  of  protocol  and  simulator  aborts  if  the  checks  were  failed  by  any  of  players. 

Fig.  15:  Simulator  ^offline 
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Simulator  5online 

The  protocol  is  described  in  the  ^OFFLiNE-hybrid  model. 

Input  Function:  If  Pi  ^  C,  simulator  generates  a  random  circuit  with  g  gates  and  corresponding  mapping  rr', 
and  follows  the  protocol  honestly.  If  Pi  £  C,  simulator  <Sonline  runs  the  protocol  honestly  and  then  waits  for 
Pi  to  broadcast  tt  and  /,  he  then  sends  them  to  ideal  functionality  Ponline. 

Input  Data:  If  Pi  0  C,  simulator  generates  a  dummy  input  and  follows  the  steps  of  protocol  honestly.  If 
Pi  G  C,  simulator  runs  the  protocol  honestly  and  waits  for  them  to  send  their  input  to  Poffline,  he  then 
sends  them  to  Ponline  ideal  functionality. 

Output:  Simulator  follows  the  protocol  steps  honestly.  For  Pi  ^  C 
—  Preparing  Inputs  to  the  Circuit: 

•  Simulator  follows  the  steps  of  protocol  honestly. 

—  Evaluating  the  Circuit:  For  every  gate  1  <  *  <  <?  in  the  circuit  players  execute  the  following  (here  we 
assume  that  the  gates  are  indexed  in  the  same  topological  order  Pi  chose  to  determine  tt): 

•  Pi  Prepares  the  Two  Inputs  for  Gate  i. 

*  Simulator  follows  the  steps  of  protocol  honestly. 

•  Players  Check  Pi’s  Input  Preparation. 

*  Simulator  follows  the  steps  of  protocol  honestly  and  aborts  if  the  checks  are  failed. 

•  Players  Jointly  Evaluate  Gate  i. 

*  The  players  store  the  value  [Vij]  =  di-  —  [vi^]  in  the  Pmpc  functionality. 

*  The  Pmpc  functionality  is  then  executed  so  as  to  compute  the  output  of  the  gate  as 

[zi]  =  (1  —  [Gi])  •  ([yio]  +  [j/iJ)  +  [Gi]  ■  [yio\  ■  [r/ij. 

*  Note  that  the  outgoing  wire  label  corresponding  to  the  output  wire  of  the  iih  gate  is  j  =  n  +  i 
(the  hrst  n  outgoing  wires  are  input  wires,  hence  output  wire  of  the  ith  gate  is  indexed  n  +  i) 
so  we  just  relabel  [zi]  to  [zj]. 

*  The  players  compute  via  the  MPC  functionality  [uj]  =  [zj]  +  [ij]. 

*  The  players  call  {Output,  Uj)  so  as  to  obtain  Uj. 

*  The  players  then  compute  via  the  MPC  functionality 

[vj]  =  [tj]  +  Uj  ■  [K]  =  [tj  +  {zj  +ej)-K]. 

*  The  players  call  {Output,  Vj)  so  as  to  obtain  Vj.  If  j  is  the  output  wire,  simulator  adjusts  his 
share  of  output  in  the  ideal  execution  to  make  the  output  consistent  with  the  shares  of  honest 
parties  as  follows:  suppose  the  output  of  that  wire  using  the  dummy  values  is  Zi  and  the  output 
returned  by  the  fFoNUNE  ideal  functionality  is  z'i,  he  then  adds  Zi  —  z'i  to  the  share  of  adversary 
[z'i]  in  the  ideal  execution. 

Fig.  16:  The  Protocol  for  implementing  PFE 
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To  see  that  the  simulated  and  real  processes  cannot  be  distinguished,  we  will  show  that  the  view  of  the 
environment  in  the  ideal  process  is  statistically  indistinguishable  from  the  view  in  the  real  process.  This  view 
consists  of  the  corrupt  players’  view  of  the  protocol  execution  as  well  as  the  inputs  and  outputs  of  honest 
players.  The  view  of  adversary  includes  [ui],  [u^],  rii.,  [zi]  and  if  i  is  the  index  of  output  wire,  Zi- 

The  shared  values  all  look  random  and  therefore  are  indistinguishable  between  ideal  and  real  execution. 

We  next  show  that  di- ,  rrii-  have  uniform  distribution.  Observe  that  Ui  is  blinded  using  the  random  value 
of  input  wires  which  is  shared  and  therefore  acts  as  a  one-time  pad,  and  as  Pi  prepares  the  two  inputs, 
it  maintains  the  uniform  distribution.  Furthermore,  pi^  also  has  uniform  distribution  from  the  security  of 
offline  protocol.  The  value  Si-  acts  as  a  one-time  pad  which  is  shared  between  the  players  and  therefore,  mi- 
has  a  uniform  distribution.  In  the  ideal  functionality  we  also  have  a  uniform  distribution,  and  as  a  result 
ideal  and  real  are  indistinguishable  to  the  environment  Z. 

For  a  malicious  Pi,  the  distributions  are  the  same,  but  we  have  to  make  sure  that  he  has  performed  the 
input  preparation  correctly.  In  the  next  phase  players  check  the  Pi’s  computation.  Player  Pi  cheating  means 
he  has  not  calculated  di^ ,  rrii^  correctly.  For  him  to  be  successful,  he  has  to  somehow  adjust  riij  and  rrii^  to 
be  equal.  He  only  has  a  option  to  adjust  di.  and  his  share  of  [5'^^.]  to  make  the  equality  hold.  Since  he  does 
not  know  K,  the  value  di.  ■  K  has  a  uniform  distribution,  and  therefore  the  probability  of  him  modifying 
[S'ij]  to  make  the  equality  hold  is  equivalent  to  guessing  K  and  hence  exponentially  small  in  length  of  K.  It 
follows  that  with  overwhelming  probability  after  the  check  the  Pi’s  computation  has  been  done  correctly.  If 
any  check  fails  the  simulator  aborts  and  stop. 

The  final  result  Zi  is  a  secret  shared  value  and  as  result  has  a  uniform  distribution.  For  the  output 
wires,  players  open  their  share,  and  Zi  is  learnt  by  all  parties.  In  order  to  make  the  distribution  of  outputs 
indistinguishable,  the  simulator  has  to  modify  his  share  of  Zi  in  the  ideal  execution.  He  is  able  to  do  so  and 
produce  the  exact  same  output  for  the  ideal  execution  as  described  in  Figure  16.  This  completes  the  proof. 
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Abstract.  We  apply  the  Flush+Reload  side-channel  attack  based  on  cache  hits/misses  to  extract  a  small  amount 
of  data  from  OpenSSL  ECDSA  signature  requests.  We  then  apply  a  “standard”  lattice  technique  to  extract  the 
private  key,  but  unlike  previous  attacks  we  are  able  to  make  use  of  the  side-channel  information  from  almost  all 
of  the  observed  executions.  This  means  we  obtain  private  key  recovery  by  observing  a  relatively  small  number  of 
executions,  and  by  expending  a  relatively  small  amount  of  post-processing  via  lattice  reduction.  We  demonstrate 
our  analysis  via  experiments  using  the  curve  secp256kl  used  in  the  Bitcoin  protocol.  In  particular  we  show  that 
with  as  little  as  200  signatures  we  are  able  to  achieve  a  reasonable  level  of  success  in  recovering  the  secret  key  for  a 
256-bit  curve.  This  is  significantly  better  than  prior  methods  of  applying  lattice  reduction  techniques  to  similar  side 
channel  information. 


1  Introduction 

One  important  task  of  cryptographic  research  is  to  analyze  cryptographic  implementations  for  potential  security  flaws. 
This  aspect  has  a  long  tradition,  and  the  most  well  known  of  this  line  of  research  has  been  the  understanding  of 
side-channels  obtained  by  power  analysis,  which  followed  from  the  initial  work  of  Kocher  and  others  [24].  More 
recently  work  in  this  area  has  shifted  to  looking  at  side-channels  in  software  implementations,  the  most  successful  of 
which  has  been  the  exploitation  of  cache-timing  attacks,  introduced  in  2002  [34].  In  this  work  we  examine  the  use  of 
spy-processes  on  the  OpenSSL  implementation  of  the  ECDSA  algorithm. 

OpenSSL  [33]  is  an  open  source  tool  kit  for  the  implementation  of  cryptographic  protocols.  The  library  of  func¬ 
tions,  implemented  using  C,  is  often  used  for  the  implementation  of  Secure  Sockets  Layer  and  Transport  Layer  Secu¬ 
rity  protocols  and  has  also  been  used  to  implement  OpenPGP  and  other  cryptographic  standards.  The  library  includes 
cryptographic  functions  for  use  in  Elliptic  Curve  Cryptography  (ECC),  and  in  particular  ECDSA.  In  particular  we  will 
examine  the  application  of  the  ElusHH-Reload  attack,  first  proposed  by  Yarom  and  Ealkner  [43],  then  adapted  to  the 
case  of  OpenSSL’s  implementation  of  ECDSA  over  binary  fields  by  Yarom  and  Benger  [42],  running  on  X86  processor 
architecture.  We  exploit  a  property  of  the  Intel  implementation  of  the  X86  and  X86_64  processor  architectures  using 
the  Elushh-Reload  cache  side-channel  attack  [42,43]  to  partially  recover  the  ephemeral  key  used  in  ECDSA. 

In  Yarom  and  Benger  [42]  the  case  of  characteristic  two  fields  was  considered,  but  the  algorithms  used  by  OpenSSL 
in  the  characteristic  two  and  prime  characteristic  cases  are  very  different.  In  particular  for  the  case  of  prime  fields  one 
needs  to  perform  a  post-processing  of  the  side-channel  information  using  cryptanalysis  of  lattices.  We  adopt  a  standard 
technique  [23, 32]  to  perform  this  last  step,  but  in  a  manner  which  enables  us  to  recover  the  underlying  secret  with 
few  protocol  execution  runs.  This  is  achieved  by  using  as  much  information  obtained  in  the  Elushh-Reload  step  as 
possible  in  the  subsequent  lattice  step. 

We  illustrate  the  effectiveness  of  the  attack  by  recovering  the  secret  key  with  a  very  high  probability  using  only  a 
small  number  of  signatures.  After  this,  we  are  able  to  forge  unlimited  signatures  under  the  hidden  secret  key.  The  results 
of  this  attack  are  not  limited  to  ECDSA  but  have  implications  for  many  other  cryptographic  protocols  implemented 
using  OpenSSL  for  which  the  scalar  multiplication  is  performed  using  a  sliding  window  and  the  scalar  is  intended  to 
remain  secret. 

©lACR  2014,  CHES  2014.  This  article  is  a  minor  revision  of  the  version  to  be  published  by  Springer- Verlag. 
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Related  Work:  Microarchitectural  side-channel  attacks  have  been  used  against  a  number  of  implementations  of  cryp¬ 
tosystems.  These  attacks  often  target  the  LI  cache  level  [1,2,5, 10, 13, 14, 39,44]  or  the  branch  prediction  buffer  [3,4]. 
The  use  of  these  components  is  limited  to  a  single  execution  core.  Consequently,  the  spy  program  and  the  victim  must 
execute  on  the  same  execution  core  of  the  processor.  Unlike  these  attacks,  the  Flushh-Reload  attack  we  use  targets 
the  last  level  cache  (LLC).  As  the  LLC  is  shared  between  cores,  the  attack  can  be  mounted  between  different  cores. 

The  attack  used  by  Gullasch  et  al.  [22]  against  AES,  is  very  similar  to  Flushh-Reload.  The  attack,  however, 
requires  the  interleaving  of  spy  and  victim  execution  on  the  same  processor  core,  which  is  achieved  by  relying  on  a 
scheduler  bug  to  interrupt  the  victim  and  gain  control  of  the  core  on  which  it  executes.  Furthermore,  the  Gullasch  et 
al.  attack  results  in  a  large  number  of  false  positives,  requiring  the  use  of  a  neural  network  to  filter  the  results. 

In  [43],  Yarom  and  Falkner  first  describe  the  FlusHH-Reload  attack  and  use  it  to  snoop  on  the  square-and- 
multiply  exponentiation  in  the  GnuPG  implementation  of  RSA  and  thus  retrieve  the  RSA  secret  key  from  the  GnuPG 
decryption  step.  The  OpenSSL  (characteristic  two)  implementation  of  ECDSA  was  also  shown  to  be  vulnerable  to  the 
Flushh-Reload  attack  [42];  around  95%  of  the  ephemeral  private  key  was  recovered  when  the  Montgomery  ladder 
was  used  for  the  scalar  multiplication  step.  The  full  ephemeral  private  key  was  then  recovered  at  very  small  cost  using 
a  Baby-Step-Giant-Step  (BSGS)  algorithm.  Knowledge  of  the  ephemeral  private  key  leads  to  recovery  of  the  signer’s 
private  key,  thus  fully  breaking  the  ECDSA  implementation  using  only  one  signature. 

One  issue  hindering  the  extension  of  the  attack  to  implementations  using  the  sliding  window  method  for  scalar 
multiplications  instead  of  the  Montgomery  ladder  is  that  only  a  lower  proportion  of  the  bits  of  the  ephemeral  private 
key  can  be  recovered  so  the  BSGS  reconstruction  becomes  infeasible.  It  is  to  extend  the  FlusHH-Reload  attack  to 
implementations  which  use  sliding  window  exponentiation  methods  that  this  paper  is  addressed. 

Suppose  we  take  a  single  ECDLP  instance,  and  we  have  obtained  partial  information  about  the  discrete  logarithm. 
In  [21,  28,  38]  techniques  are  presented  which  reduce  the  search  space  for  the  underlying  discrete  logarithm  when 
various  types  of  partial  information  is  revealed.  These  methods  work  quite  well  when  the  information  leaked  is  con¬ 
siderable  for  the  single  discrete  logarithm  instance;  as  for  example  evidenced  by  the  side-channel  attack  of  [42]  on  the 
Montgomery  ladder.  However,  in  our  situation  a  different  approach  needs  to  be  taken. 

Similar  to  several  past  works,  e.g.  [10, 1 1, 29],  we  will  exploit  a  well  known  property  of  ECDSA,  that  if  a  small 
amount  of  information  about  each  ephemeral  key  in  each  signature  leaks,  for  a  number  of  signatures,  then  one  can 
recover  the  underlying  secret  using  a  lattice  based  attack  [23, 32].  The  key  question  arises  as  to  how  many  signatures 
are  needed  so  as  to  be  able  to  extract  the  necessary  side  channel  information  to  enable  the  lattice  based  attack  to 
work.  The  lattice  attack  works  by  constructing  a  lattice  problem  from  the  obtained  digital  signatures  and  side  channel 
information,  and  then  applying  lattice  reduction  techniques  such  as  LLL  [25]  or  BKZ  [37]  to  solve  the  lattice  problem. 
Using  this  methodology  Nguyen  and  Shparlinski  [32],  suggest  that  for  an  elliptic  curve  group  of  order  around  160  bits, 
their  probabilistic  algorithm  would  obtain  the  secret  key  using  an  expected  23  x2^  signatures  (assuming  independent 
and  uniformly  at  random  selected  messages)  in  polynomial  time,  using  only  seven  consecutive  least  significant  leaked 
bits  of  each  ephemeral  private  key.  A  major  issue  of  their  attack  in  practice  is  that  it  seems  hard  to  apply  when  only  a 
few  bits  of  the  underlying  ephemeral  private  key  are  determined. 

Our  Contribution:  Through  the  Flushh-Reload  attack  we  are  able  to  obtain  a  significant  proportion  of  the  ephemeral 
private  key  bit  values,  but  they  are  not  clustered  but  in  positions  spread  through  the  length  of  the  ephemeral  private 
key.  As  a  result,  we  only  obtain  for  each  signature  a  few  (maybe  only  one)  consecutive  bits  of  the  ECDSA  ephemeral 
private  key,  and  so  the  technique  described  in  [32]  does  not  appear  at  first  sight  to  be  instantly  applicable.  The  main 
contribution  of  this  work  is  to  combine  and  adapt  the  FlusHH-Reload  attack  and  the  lattice  techniques.  The  FlusHH- 
Reload  attack  is  refined  to  optimise  the  proportion  of  information  which  can  be  obtained,  then  the  lattice  techniques 
are  adapted  to  utilize  the  information  in  the  acquired  data  in  an  optimal  manner.  The  result  is  that  we  are  able  to 
reconstruct  secret  keys  for  256  bit  elliptic  curves  with  high  probability,  and  low  work  effort,  after  obtaining  less  than 
256  signatures. 

We  illustrate  the  effectiveness  of  the  attack  by  applying  it  to  the  OpenSSL  implementation  of  ECDSA  using  a 
sliding  window  to  compute  scalar  multiplication,  recovering  the  victims ’s  secret  key  for  the  elliptic  curve  secp256kl 
used  in  Bitcoin  [30].  The  implementation  of  the  secp256kl  curve  in  OpenSSL  is  interesting  as  it  uses  the  wNAF 
method  for  exponentiation,  as  opposed  to  the  GLV  method  [19],  for  which  the  curve  was  created.  It  would  be  an 
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interesting  research  topic  to  see  how  to  apply  the  Flush+Reload  technique  to  an  implementation  which  uses  the 
GLV  point  multiplication  method. 

In  terms  of  the  application  to  Bitcoin  an  obvious  mitigation  against  the  attack  is  to  limit  the  number  of  times  a 
private  key  is  used  within  the  Bitcoin  protocol.  Each  wallet  corresponds  to  a  public/private  key  pair,  so  this  essentially 
limits  the  number  of  times  one  can  spend  from  a  given  wallet.  Thus,  by  creating  a  chain  of  wallets  and  transferring 
Bitcoins  from  one  wallet  to  the  next  it  is  easy  to  limit  the  number  of  signing  operations  carried  out  by  a  single  private 
key.  See  [9]  for  a  discussion  on  the  distribution  of  public  keys  currently  used  in  the  Bitcoin  network. 

The  remainder  of  the  paper  is  organised  as  follows:  In  2  we  present  the  background  on  ECDSA  and  the  signed 
sliding  window  method  (or  wNAE  representation)  needed  to  understand  our  attack.  Then  in  3  we  present  our  method¬ 
ology  for  applying  the  Elushh- Reload  attack  on  the  OpenSSL  implementation  of  the  signed  sliding  window  method 
of  exponentiation.  Then  in  4  we  use  the  information  so  obtained  to  create  a  lattice  problem,  and  we  demonstrate  the 
success  probability  of  our  attack. 


2  Mathematical  Background 

In  this  section  we  present  the  mathematical  background  to  our  work,  by  presenting  the  ECDSA  algorithm,  and  the 
wNAE/signed  window  method  of  point  multiplication  which  is  used  by  OpenSSL  to  implement  ECDSA  in  the  case  of 
curves  defined  over  prime  finite  fields. 

ECDSA:  The  ElGamal  Signature  Scheme  [20]  is  the  basis  of  the  US  1994  NIST  standard.  Digital  Signature  Algorithm 
(DSA).  The  ECDSA  is  the  adaptation  of  one  step  of  this  algorithm  from  the  multiplicative  group  of  a  finite  field  to  the 
group  of  points  on  an  elliptic  curve,  and  is  the  signature  algorithm  using  elliptic  curve  cryptography  with  widescale 
deployment.  In  this  section  we  outline  the  algorithm,  so  as  to  fix  notation  for  what  follows: 

Parameters:  The  scheme  uses  as  ‘domain  parameters’,  which  are  parameters  which  can  be  shared  by  a  large  number 
of  users,  an  elliptic  curve  E  defined  over  a  finite  field  and  a  point  G  €  E  of  a  large  prime  order  n.  The  point  G  is 
considered  as  a  generator  of  the  group  of  points  of  order  n.  The  parameters  are  chosen  as  such  are  generally  believed 
to  offer  a  (symmetric)  security  level  of  -y/n  given  current  knowledge  and  technologies.  The  field  size  q  is  usually  taken 
to  be  a  large  odd  prime  or  a  power  of  2.  The  implementation  of  OpenSSL  uses  both  cases,  but  in  this  paper  we  will 
focus  on  the  case  of  q  being  a  large  prime. 

Public-Private  Key  pairs:  The  private  key  is  an  integer  a,  l<a<n  —  1  and  the  public  key  is  the  point  Q  =  [a]G. 
Calculating  the  private  key  from  the  public  key  requires  solving  the  ECDLP,  which  is  believed  to  be  hard  in  practice 
for  correctly  chosen  parameters.  The  most  efficient  currently  known  algorithms  for  solving  the  ECDLP  have  a  square 
root  run  time  in  the  size  of  the  group  [18,41],  hence  the  aforementioned  security  level. 

Signing:  Suppose  Bob,  with  private-public  key  pair  {a,Q},  wishes  to  send  a  signed  message  m  to  Alice.  Eor  ECDSA 
he  follows  the  following  steps: 

1 .  Using  an  approved  hash  algorithm,  compute  e  =  Hash{m) ,  take  h  to  be  the  integer  (modulo  n)  given  by  the  leftmost 
£  bits  of  e  (where  £  =  min(log2(n),  the  bitlength  of  the  hash)). 

2.  Randomly  select  k  GZn. 

3.  Compute  the  point  (x,y)  =  [k]G  G  E. 

4.  Take  r  =  x  mod  n;  if  r  =  0  then  return  to  step  2. 

5.  Compute  s  =  k^^(h-i-r-  a)  mod  n;  if  s  =  0  then  return  to  step  2. 

6.  Bob  sends  {m,r,s)  to  Alice. 


Verification:  To  verify  the  signature  on  the  message  sent  by  Bob,  Alice  performs  the  following  steps. 

1.  Check  that  all  received  parameters  are  correct,  that  r,s  G  and  that  Bob’s  public  key  is  valid,  that  is  Q  ^  and 
2  G  E  is  of  order  n. 
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2.  Using  the  same  hash  function  and  method  as  above,  compute  h  —  Hash{m)  (mod  n). 

3.  Compute  s  =  mod  n. 

4.  Compute  the  point  (x,y)  =  [h  ■  i]G+  [r  ■  s]Q. 

5.  Verify  that  r  =  x  mod  n  otherwise  reject  the  signature. 

ECDSA  is  a  very  brittle  algorithm,  in  the  sense  that  an  incorrectly  implemented  version  of  Step  2  of  the  signing 
algorithm  can  lead  to  catastrophic  security  weaknesses.  For  example,  an  inappropriate  reuse  of  the  random  integer 
led  to  the  highly  publicised  breaking  of  the  Sony  PS3  implementation  of  ECDSA.  Knowledge  of  the  random  value  k, 
often  referred  to  as  the  ephemeral  key,  leads  to  knowledge  of  the  secret  key,  since  given  a  message/signature  pair  and 
the  corresponding  ephemeral  key  one  can  recover  the  secret  key  via  the  equation 

a  =  (s-k  —  h)  . 


It  is  this  equation  which  we  shall  exploit  in  our  attack,  but  we  shall  do  this  via  obtaining  side  channel  information  via 
a  spy  process.  The  spy  process  targets  the  computationally  expensive  part  of  the  signing  algorithm,  namely  Step  3. 

Scalar  multiplication  using  wNAF:  In  OpenSSL  Step  3  in  the  signing  algorithm  is  implemented  using  the  wNAF 
algorithm.  Suppose  we  wish  to  compute  [d\P  for  some  integer  value  d  €  [0, ...  ,2^],  the  wNAF  method  utilizes  a  small 
amount  of  pre-processing  on  P  and  the  fact  that  addition  and  subtraction  in  the  elliptic  curve  group  have  the  same  cost, 
so  as  to  obtain  a  large  performance  improvement  on  the  basic  binary  method  of  point  multiplication.  To  define  wNAF 
a  window  size  w  is  first  chosen,  which  for  OpenSSF,  and  the  curve  secp256kl,  we  have  w  =  3.  Then  2"'  —  2  extra 
points  are  stored,  with  a  precomputation  cost  of  2"'^^  —  1  point  additions,  and  one  point  doubling.  The  values  stored 
are  the  points  {±G, ±[3]G, . . .  ,±[2“”  —  1]G}. 

The  next  task  is  to  convert  the  integer  d  into  so  called  Non- Adjacent  From  (NAF).  This  is  done  by  the  method  in 
Algorithm  1  which  rewrites  the  integer  c/  as  a  sum  d  —  Lf=d  ■  2',  where  di  €  {±1,±3, . . .  ,±(2'^  ~  !)}■  The  Non- 
Adjacent  From  is  so  named  as  for  any  d  written  in  NAF,  the  output  values  do,---,  di^\,  are  such  that  for  every  non-zero 
element  di  there  are  at  least  w+l  following  zero  values. 


Input:  scalar  d  and  window  width  w 
Output:  d  in  wNAF:  do, . .  -  ,d(_i 
f  ^0 

while  d  >  0  do 

if  d  mod  2  =  1  then 
df  ^  d  mod  2“'+* 

if  di  >  2'^  then 

I  de^di-2'-+^ 

end 

d  =  d  ~  di 

else 

I  di  =  0 

end 

d  =  d/2 

e+  =  i 

end 

Algorithm  1:  Conversion  to  Non- Adjacent  Form 


Once  the  integer  d  has  been  re-coded  into  wNAF  form,  the  point  multiplication  can  be  carried  out  by  Algorithm  2. 
The  occurrence  of  a  non-zero  di  controls  when  an  addition  is  performed,  with  the  precise  value  of  di  determining 
which  point  from  the  list  is  added. 

Before  ending  this  section  we  note  some  aspects  of  the  algorithm,  and  how  these  are  exploited  in  our  attack.  A  spy 
process,  by  monitoring  the  cache  hits/misses,  can  determine  when  the  code  inside  the  if-then  block  in  Algorithm  2 
is  performed.  This  happens  when  the  element  di  is  non-zero,  which  reveals  the  fact  that  the  following  w  +  I  values 
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Input:  scalar  d  in  wNAF  do, . .  ■,di^i  and  precomputed  points  {G,  ±[3]G,  ±[5]G, . . . ,  ±[2”'  —  1]G} 
Output:  [d]G 

for  j  from  £  — I  downto  0  do 

Q^m 

if  dj  i=-  0  then 

I  e^e+K-]G 

end 

end 

Algorithm  2:  Computation  of  kG  using  OpenSSL  wNAF 


c/,+1, . . .  ,c/,+jv+i  are  all  zero.  This  reveals  some  information  about  the  value  d,  but  not  enough  to  recover  the  value  of 
d  itself. 

Instead  we  focus  on  the  last  values  of  di  processed  by  Algorithm  2.  We  can  determine  precisely  how  many  least 
significant  bits  of  d  are  zero,  which  means  we  can  determine  at  least  one  bit  of  d,  and  with  probability  1  /2  we  determine 
two  bits,  with  probability  1/4  we  determine  three  bits  and  so  on.  Thus  we  not  only  extract  information  about  whether 
the  least  significant  bits  are  zero,  but  we  also  use  the  information  obtained  from  the  first  non-zero  bit. 

In  practice  in  the  OpenSSL  code  the  execution  of  line  3  is  slightly  modified.  Instead  of  computing  [k]G,  the  code 
computes  [k  +  X  ■  n]G  where  X  G  {1,2}  is  chosen  such  that  [log2{k  +  X  ■  n)\  =  [log2(n)J  +  1.  The  fixed  size  scalar 
provides  protection  against  the  Brumley  and  Tuveri  remote  timing  attack  [11].  For  the  secp256kl  curve,  n  is  2^^®  —  e 
where  e  <  2^^®.  The  case  X  =  2,  therefore,  only  occurs  for  k  <  e.  As  the  probability  of  this  case  is  less  than  2^*^^,  we 
can  assume  the  wNAF  algorithm  is  applied  with  d  =  k  +  n. 


3  Attacking  OpenSSL 

In  prior  work  the  Montgomery  ladder  method  of  point  multiplication  was  shown  to  be  vulnerable  to  a  Flushh- Re¬ 
load  attack  [42].  This  section  discusses  the  wNAF  implementation  of  OpenSSL  and  demonstrates  that  it  is  also 
vulnerable.  Unlike  the  side-channel  in  the  Montgomery  ladder  implementation,  which  recovers  enough  bits  to  allow  a 
direct  recovery  of  the  ephemeral  private  key  [42],  the  side-channel  in  the  wNAF  implementation  only  leaks  an  average 
of  two  bits  in  each  window.  Consequently,  a  further  algebraic  attack  is  required  to  recover  the  private  key.  This  section 
describes  the  Flushh-Reload  attack,  and  its  application  to  the  OpenSSL  wNAF  implementation.  The  next  section 
completes  the  recovery  of  the  secret  key. 

Flushh-Reload  is  a  cache  side-channel  attack  that  exploits  a  property  of  the  Intel  implementation  of  the  X86 
and  X86_64  processor  architectures,  which  allows  processes  to  manipulate  the  cache  of  other  processes  [42,43]. 

Using  the  attack,  a  spy  program  can  trace  or  monitor  memory  read  and  execute  access  of  a  victim  program  to  shared 
memory  pages.  The  spy  program  only  requires  read  access  to  the  shared  memory  pages,  hence  pages  containing  binary 
code  in  executable  files  and  in  shared  libraries  are  susceptible  to  the  attack.  Furthermore,  pages  shared  through  the  use 
of  memory  de-duplication  in  virtualized  environments  [6, 40]  are  also  susceptible  and  using  them  the  attack  can  be 
applied  between  co-located  virtual  machines. 

The  spy  program  needs  to  execute  on  the  same  physical  processor  as  the  victim,  however,  unlike  most  cache-based 
side  channel  attacks,  our  spy  monitors  access  to  the  last-level  cache  (LLC).  As  the  LLC  is  shared  between  the  process¬ 
ing  cores  of  the  processor,  the  spy  does  not  need  to  execute  on  the  same  processing  core  as  the  victim.  Consequently, 
the  attack  is  applicable  to  multi-core  processors  and  is  not  dependent  on  hyperthreading  or  on  exploitable  scheduler 
limitations  like  other  published  microarchitectural  side-channel  attacks. 

To  monitor  access  to  memory,  the  spy  repeatedly  evicts  the  contents  of  the  monitored  memory  from  the  LLC, 
waits  for  some  time  and  then  measures  the  time  to  read  the  contents  of  the  monitored  memory.  See  Algorithm  3  for  a 
pseudo-code  of  the  attack.  As  reading  from  the  LLC  is  much  faster  than  reading  from  memory,  the  spy  can  differentiate 
between  these  two  cases.  If,  following  the  wait,  the  contents  of  the  memory  is  retrieved  from  the  cache,  it  indicates 
that  another  process  has  accessed  the  memory.  Thus,  by  measuring  the  time  to  read  the  contents  of  the  memory,  the 
spy  can  decide  whether  the  victim  has  accessed  the  monitored  memory  since  the  last  time  it  was  evicted. 
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Input:  adrs — the  probed  address 

Output:  true  if  the  address  was  accessed  by  the  victim 
begin 

evict(fldra) 
wait_a_bit() 
time  ^  current_time() 

Imp  ^  read(adrs) 
readTime  ^  current_time()-fime 
return  readTime  <  threshold 
end 

Algorithm  3:  Flush+Reload  Algorithm 


Monitoring  access  to  specific  memory  lines  is  one  of  the  strengths  of  the  FlusH+Reload  technique.  Other  cache- 
based  tracing  techniques  monitor  access  to  sets  of  memory  lines  that  map  to  the  same  cache  set.  The  use  of  specific 
memory  lines  reduces  the  chance  of  false  positives.  Capturing  the  access  to  the  memory  line,  therefore,  indicates 
that  the  victim  executes  and  has  accessed  the  line.  Consequently,  Flushh-Reload  does  not  require  any  external 
mechanism  to  synchronize  with  the  victim. 

We  tested  the  attack  on  an  HP  Elite  8300  running  Fedora  18.  The  machine  features  an  Intel  Core  15-3470  processor, 
with  four  execution  cores  and  a  6MB  EEC.  As  the  OpenSSL  package  shipped  with  Eedora  does  not  support  ECC,  we 
used  our  own  build  of  OpenSSL  1.0. le.  Eor  the  experiment  we  used  the  curve  secp256kl  which  is  used  by  Bitcoin. 

Eor  the  attack,  we  used  the  implementation  of  ElusHH-Reload  from  [43].  The  spy  program  divides  time  into 
time  slots  of  approximately  3,000  cycles  (almost  1/rs).  In  each  time  slot  the  spy  probes  memory  lines  in  the  group  add 
and  double  functions.  (ec_GFp_simple_add  and  ec_GFp_simple_dbl,  respectively.)  The  time  slot  length  is  chosen 
to  ensure  that  there  is  an  empty  slot  during  the  execution  of  each  group  operation.  This  allows  the  spy  to  correctly 
distinguish  consecutive  doubles. 

The  probes  are  placed  on  memory  lines  which  contain  calls  to  the  field  multiplication  function.  Memory  lines 
containing  call  sites  are  accessed  both  when  the  function  is  called  and  when  it  returns.  Hence,  by  probing  these 
memory  lines,  we  reduce  the  chance  of  missing  accesses  due  to  overlaps  with  the  probes.  See  [43]  for  a  discussion  of 
overlaps. 

To  find  the  memory  lines  containing  the  call  sites  we  built  OpenSSL  with  debugging  symbols.  These  symbols 
are  not  loaded  at  run  time  and  do  not  affect  the  performance  of  the  code.  The  debugging  symbols  are,  typically,  not 
available  for  attackers,  however  their  absence  would  not  present  a  major  obstacle  to  a  determined  attacker  who  could 
use  reverse  engineering  [16]. 


4  Lattice  Attack  Details 


We  applied  the  above  process  on  the  OpenSSL  implementation  of  ECDSA  for  the  curve  secp256kl.  We  fixed  a  public 
key  Q  =  [a]G,  and  then  monitored  via  the  Elushh-Reload  spy  process  the  generation  of  a  set  of  d  signature  pairs 
(r,,s,)  for  i=  Eor  each  signature  pair  there  is  a  known  hashed  message  value  and  an  unknown  ephemeral 

private  key  value  k,  . 

Using  the  Elushh-Reload  side-channel  we  also  obtained,  with  very  high  probability,  the  sequence  of  point 
additions  and  doubling  used  when  OpenSSL  executes  the  operation  [k,  +  n]G.  In  particular,  this  means  we  learn  values 
Ci  and  li  such  that 

ki  +  n  =  Ci  (mod  2^'  ), 


or  equivalently 


ki  =  Ci  —  n  (mod  2^'). 


Where  1,  denotes  the  number  of  known  bits.  We  can  also  determine  the  length  of  the  known  run  of  zeroes  in  the  least 
significant  bits  of  k,  -f  n,  which  we  will  call  z,  .  In  presenting  the  analysis  we  assume  the  d  signatures  have  been  selected 
such  that  we  already  know  that  the  value  of  k,  +  n  is  divisible  by  2^,  for  some  value  of  Z,  i.e.  we  pick  signatures  for 
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which  Zi  >  Z.  In  practice  this  means  that  to  obtain  d  such  signatures  we  need  to  collect  (on  average)  d  ■  2^  signatures 
in  total. 

We  write  a,  =  Ci  —  n  (mod  2^‘).  For  example,  writing  A  for  an  add,  D  for  a  double  and  X  for  a  don’t  know,  we 
can  read  off  c,,  U  and  z;  from  the  least  execution  sequence  obtained  in  the  FlusH+Reload  analysis.  In  practice  the 
FlusH+Reload  attack  is  so  efficient  that  we  are  able  to  identify  A’s  and  D’s  with  almost  100%  certainty,  with  only 
e  =  0.55%  —  0.65%  of  the  symbols  turning  out  to  be  don’t  knows.  To  read  off  the  values  we  use  the  following  table 
(and  its  obvious  extension),  where  we  present  the  approximate  probability  of  our  attack  revealing  this  sequence. 


Sequence 

Ci 

h 

Pr« 

...V 

0 

0.0  0 

e 

...A 

1 

1.0  0 

(l-e)/2 

...XD 

0 

1.0  1 

e  •  (1  -e)/2 

...AD 

2 

2.0  1 

((l-e)/2)2 

...XDD 

0 

2.0  2 

e-((l-e)/2)2 

. .  .ADD 

4 

3.0  2 

((l-e)/2)3 

For  a  given  execution  of  the  Flush+Reload  attack,  from  the  table  we  can  determine  c,  and  U,  and  hence  a,.  Then, 
using  the  standard  analysis  from  [31,32],  we  determine  the  following  values 

ti  =  ■Si)\n, 

Ui  =  -  hi/si)/2‘'\„  +  n/2‘‘+\ 

where  [•]„  denotes  reduction  modulo  n  into  the  range  [0, ...,n).  We  then  have  that 

Vi  =  <n/2''+',  (1) 

where  |  •  |„  denotes  reduction  by  n,  but  into  the  range  (— n/2, . . .  ,n/2).  It  is  this  latter  equation  which  we  exploit,  via 
lattice  basis  reduction,  so  as  to  recover  d.  The  key  observation  found  in  [31, 32]  is  that  the  value  v,  is  smaller  (by  a 
factor  of  2^'+^)  than  a  random  integer.  Unlike  prior  work  in  this  area  we  do  not  (necessarily)  need  to  just  select  those 
executions  which  give  us  a  “large”  value  of  Zi,  say  Zi  >  3.  Prior  work  fixes  a  minimum  value  of  z,  (or  essentially 
equivalently  Z,)  and  utilizes  this  single  value  in  all  equations  such  as  (1).  If  we  do  this  we  would  need  to  throw  away 
all  bar  1/2^'+^  of  the  executions  obtained.  By  maintaining  full  generality,  i.e.  a  variable  value  of  z;  (subject  to  the 
constraint  z,  >  Z)  in  each  instance  of  (1),  we  are  able  to  utilize  all  information  at  our  disposal  and  recover  the  secret 
key  a  with  very  little  effort  indeed. 

The  next  task  is  to  turn  the  equations  from  (1)  into  a  lattice  problem.  Following  [31, 32]  we  do  this  in  one  of  two 
possible  ways,  which  we  now  recap  on. 

Attack  via  CVP:  We  first  consider  the  lattice  L{B)  in  c/  +  1 -dimensional  real  space,  generated  by  the  rows  of  the 
following  matrix 

/2''+'.n  \ 

B  = 

2'rf+l  -n 

V2'i+'-fi  ...  2'<i+'-U  1/ 

From  (1)  we  find  that  there  are  integers  {Xi,.. .  ,Xd)  such  that  if  we  set  x  =  (Ai, . . . ,  a)  and  y  =  (2^'+*  •  vi, . . .  ,2^“'+*  • 
Vrf,  a)  and  u  =  (2^>+'  •  mi,  . . .  ,2^“'+*  •  Ud,0),  then  we  have 

xB  —  u  =  y. 

We  note  that  the  2-norm  of  the  vector  y  is  about  ^/d+1  ■  n,  whereas  the  lattice  determinant  of  L{B)  is  2^1+^ .  r/.  Thus 
the  vector  u  is  a  close  vector  to  the  lattice.  Solving  the  Closest  Vector  Problem  (CVP)  with  input  B  and  u  therefore 
reveals  x  and  hence  the  secret  key  a. 
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Attack  via  SVP:  It  is  often  more  effective  in  practice  to  solve  the  above  CVP  problem  via  the  means  of  embedding 
the  CVP  into  a  Shortest  Vector  Problem  (SVP)  in  a  slightly  bigger  lattice.  In  particular  we  take  the  lattice  L{B’)  in 
d  +  2-dimensional  real  space  generated  by  the  rows  of  the  matrix 


B'  = 


This  lattice  has  determinant  2"^+^  •  n  ,  by  taking  the  lattice  vector  generated  by  x'  =  (x ,  a ,  —  1 )  we  obtain  the  lattice 
vector  y'  =il'  -B'  —  (y,  —n).  The  2-norm  of  this  lattice  vector  is  roughly  \/d  +  2-n.  We  expect  the  second  vector  in  a 
reduced  basis  to  be  of  size  c  •  n,  and  so  there  is  a  “good”  chance  for  a  suitably  strong  lattice  reduction  to  obtain  a  lattice 
basis  whose  second  vector  is  equal  to  y'.  Note,  the  first  basis  vector  is  likely  to  be  given  by  (— fj , . . . ,  —tci,n,  Q)-B’  — 
(0,...,0,n,0). 


4.1  Experimental  Results 

To  solve  the  SVP  problem  we  used  the  BKZ  algorithm  [37]  as  implemented  in  fplll  [12].  However,  this  implementation 
is  only  efficient  for  small  block  size  (say  less  than  35),  due  to  the  fact  that  BKZ  is  an  exponential  algorithm  in  the 
block  size.  Thus  for  larger  block  size  we  implemented  a  variant  of  the  BKZ-2.0  algorithm  [15],  however  this  algorithm 
is  only  effective  for  block  sizes  j3  greater  than  50.  In  tuning  BKZ-2.0  we  used  the  following  strategy,  at  the  end  of 
every  round  we  determined  whether  we  had  already  solved  for  the  private  key,  if  not  we  continued,  and  then  gave  up 
after  ten  rounds.  As  stated  above  we  applied  our  attack  to  the  curve  secp256kl. 

We  wished  to  determine  what  the  optimal  strategy  was  in  terms  of  the  minimum  value  of  Z  we  should  take,  the 
optimal  lattice  dimension,  and  the  optimal  lattice  algorithm.  Thus  we  performed  a  number  of  experiments  which  are 
reported  on  in  Tables  2,  3  and  4  in  Appendix  A;  where  we  present  our  best  results  obtained  for  each  {d,Z)  pair. 
We  also  present  graphs  to  show  how  the  different  values  of  j3  affected  the  success  rate.  For  each  lattice  dimension, 
we  measured  the  optimal  parameters  as  the  ones  which  minimized  the  value  of  lattice  execution  time  divided  by 
probability  of  success.  The  probability  of  success  was  measured  by  running  the  attack  a  number  of  times,  and  seeing 
in  how  many  executions  we  managed  to  recover  the  underlying  secret  key.  We  used  Time  divided  by  Probability  is  a 
crude  measure  of  success,  but  we  note  this  hides  other  issues  such  as  expected  number  of  executions  of  the  signature 
algorithm  needed. 

All  executions  were  performed  on  an  Intel  Xeon  CPU  running  at  2.40  GHz,  on  a  machine  with  4GB  of  RAM.  The 
programs  were  run  in  a  single  thread,  and  so  no  advantages  where  made  of  the  multiple  cores  on  the  processor.  We  ran 
experiments  for  the  SVP  attack  using  BKZ  with  block  size  ranging  from  5  to  40  and  with  BKZ-2.0  with  blocksize  50. 
With  our  crude  measure  of  Time  divided  by  Probability  we  find  that  BKZ  with  block  size  15  or  20  is  almost  always 
the  method  of  choice  for  the  SVP  method. 

We  see  that  the  number  of  signatures  needed  is  consistent  with  what  theory  would  predict  in  the  case  of  Z  =  1  and 
Z  =  2,  i.e.  the  lattice  reduction  algorithm  can  extract  from  the  side-channel  the  underlying  secret  key  as  soon  as  the 
expected  number  of  leaked  bits  slightly  exceeds  the  number  of  bits  in  the  secret  key.  For  Z  =  0  this  no  longer  holds, 
we  conjecture  that  this  is  because  the  lattice  algorithms  are  unable  to  reduce  the  basis  well  enough,  in  a  short  enough 
amount  of  time,  to  extract  the  small  amount  of  information  which  is  revealed  by  each  signature.  In  other  words  the 
input  basis  for  Z  =  0  is  too  close  to  looking  like  a  random  basis,  unless  a  large  amount  of  signatures  is  used. 

To  solve  the  CVP  problem  variant  we  applied  a  pre-processing  of  either  fplll  or  BKZ-2.0.  When  applying  pre¬ 
processing  of  BKZ-2.0  we  limited  to  only  one  round  of  execution.  We  then  applied  an  enumeration  technique,  akin  to 
the  enumeration  used  in  the  enumeration  sub-routine  of  BKZ,  but  centered  around  the  target  close  vector  as  opposed 
to  the  origin.  When  a  close  vector  was  found  this  was  checked  to  see  whether  it  revealed  the  secret  key,  and  if  not 
the  enumeration  was  continued.  We  restricted  the  number  of  nodes  in  the  enumeration  tree  to  2^®,  so  as  to  ensure  the 
enumeration  did  not  go  on  for  an  excessive  amount  of  time  in  the  cases  where  the  solution  vector  is  hard  to  find  (this 
mainly  affected  the  experiments  in  dimension  greater  than  150).  See  Tables  5,  6  and  7,  in  Appendix  A,  for  details  of 
these  experiments;  again  we  present  the  best  results  for  each  {d,Z)  pair.  The  enumeration  time  is  highly  dependent  on 
whether  the  close  lattice  vector  is  really  close  to  the  lattice,  thus  we  see  that  when  the  expected  number  of  bits  revealed 
per  signature  times  the  number  of  signatures  utilized  in  the  lattice,  gets  close  to  the  bit  size  of  elliptic  curve  (256)  the 
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enumeration  time  drops.  Again  we  see  that  extensive  pre-processing  of  the  basis  with  more  complex  lattice  reduction 
techniques  provides  no  real  benefit. 

The  results  of  the  SVP  and  CVP  experiments  (Appendix  A)  show  that  for  fixed  Z,  increasing  the  dimension 
generally  decreases  the  overall  expected  running  time.  In  some  sense,  as  the  dimension  increases  more  information  is 
being  added  to  the  lattice  and  this  makes  the  desired  solution  vector  stand  out  more.  The  higher  block  sizes  perform 
better  in  the  lower  dimensions,  as  the  stronger  reduction  allows  them  to  isolate  the  solution  vector  better.  The  lower 
block  sizes  perform  better  in  the  higher  dimensions,  as  the  high-dimensional  lattices  already  contain  much  information 
and  strong  reduction  is  not  required. 

The  one  exception  to  this  rule  is  the  case  of  Z  =  2  in  the  CVP  experiments.  In  dimensions  below  80  the  CVP  can 
be  solved  relatively  quickly  here,  whereas  in  dimensions  80  up  to  100  it  takes  more  time.  This  can  be  explained  as 
follows:  in  the  low  dimension  the  CVP-tree  is  not  very  big,  but  contains  many  solutions.  This  means  that  enumeration 
of  the  CVP-tree  is  very  quick,  but  the  solution  vector  is  not  unique.  Thus,  the  probability  of  success  is  equal  to  the 
probability  of  finding  the  right  vector.  From  dimension  80  upwards,  we  expect  the  solution  vector  to  be  unique,  but  the 
CVP-trees  become  much  bigger  on  average.  If  we  do  not  stop  the  enumeration  after  a  fixed  number  of  nodes,  it  will 
find  the  solution  with  high  probability,  but  the  enumeration  takes  much  longer.  Here,  the  probability  of  success  is  the 
probability  of  finding  a  solution  at  all. 

We  first  note,  for  both  our  lattice  variants,  that  there  is  a  wide  variation  in  the  probability  of  success,  if  we  ran  a 
larger  batch  of  tests  we  would  presume  this  would  stabilize.  However,  even  with  this  caveat  we  notice  a  number  of 
remarkable  facts.  Firstly,  recall  we  are  trying  to  break  a  256  bit  elliptic  curve  private  key.  The  conventional  wisdom  has 
been  that  using  a  window  style  exponentiation  method  and  a  side-channel  which  only  records  a  distinction  between 
addition  and  doubling  (i.e.  does  not  identify  which  additions),  one  would  need  much  more  than  256  executions  to 
recover  the  secret  key.  However,  we  see  that  we  have  a  good  chance  of  recovering  the  key  with  less  than  this.  For 
example,  Nguyen  and  Shparlinksi  [32]  estimated  needing  23  x  2^  =  2944  signatures  to  recover  a  160  bit  key,  when 
seven  consecutive  zero  bits  of  the  ephemeral  private  key  were  detected.  Namely  they  would  use  a  lattice  of  dimension 
23,  but  require  2944  signatures  to  enable  to  obtain  23  signatures  for  which  they  could  determine  the  ones  with  seven 
consecutive  digits  of  the  ephemeral  private  key.  Note  that  23  •  7  =  161  >  160.  Liu  and  Nguyen  [26]  extended  this  attack 
by  using  improved  lattice  algorithms,  decreasing  the  number  of  signatures  required.  We  are  able  to  have  a  reasonable 
chance  of  success  with  as  little  as  200  signatures  obtained  against  a  256  bit  key. 

In  our  modification  of  the  lattice  attack  we  not  only  utilize  zero  least  significant  bits,  but  also  notice  that  the  end  of 
a  run  of  zeros  tells  us  that  the  next  bit  is  one.  In  addition  we  utilize  all  of  the  run  of  zeros  (say  for  example  eight)  and 
not  just  some  fixed  pre-determined  number  (such  as  four).  This  explains  our  improved  lattice  analysis,  and  shows  that 
one  can  recover  the  secret  with  relatively  high  probability  with  just  a  small  number  of  measurements. 

As  a  second  note  we  see  that  strong  lattice  reduction,  i.e.  high  block  sizes  in  the  BKZ  algorithm,  or  even  applying 
BKZ-2.0,  does  not  seem  to  gain  us  very  much.  Indeed  acquiring  a  few  extra  samples  allows  us  to  drop  down  to  using 
BKZ  with  blocksize  twenty  in  almost  all  cases.  Note  that  in  many  of  our  experiments  a  smaller  value  of  j3  resulted  in 
a  much  lower  probability  of  success  (often  zero),  whilst  a  higher  value  of  j3  resulted  in  a  significantly  increased  run 
time. 

Thirdly,  we  note  that  if  one  is  unsuccessful  on  one  run,  one  does  not  need  to  derive  a  whole  new  set  of  traces, 
simply  by  increasing  the  number  of  traces  a  little  bit  one  can  either  take  a  new  random  sample  of  the  traces  one  has, 
or  increase  the  lattice  dimension  used. 

We  end  by  presenting  in  Table  1  the  best  variant  of  the  lattice  attack,  measured  in  terms  of  the  minimal  value  of 
Time  divided  by  Probability  of  success,  for  the  number  of  signatures  obtained.  We  see  that  in  a  very  short  amount  of 
time  we  can  recover  the  secret  key  from  260  signatures,  and  with  more  effort  we  can  even  recover  it  from  the  Flushh- 
Reload  attack  applied  to  as  little  at  200  signatures.  We  see  that  it  is  not  clear  whether  the  SVP  or  the  CVP  approach 
is  the  best  strategy. 


5  Mitigation 

As  our  attack  requires  capturing  multiple  signatures,  one  way  of  mitigating  it  is  limiting  the  number  of  times  a  private 
key  is  used  for  signing.  Bitcoin,  which  uses  the  secp256kl  curve  on  which  this  work  focuses,  recommends  using  a 
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Expected 
#  Sigs 

SVP/ 

SVP 

d 

Z  = 
min{z,} 

Pre-Processing 
and/or  SVP  Algorithm 

Time  (s) 

Prob 

Success 

100  X 
Time/Prob 

200 

SVP 

o 

o 

1 

BKZ  (/3  =  30) 

611.13 

3.5 

17460 

220 

SVP 

110 

1 

BKZ  (/3  =  25) 

78.67 

2.0 

3933 

240 

CVP 

60 

2 

BKZ  (/3  =  25) 

2.68 

0.5 

536 

260 

CVP 

65 

2 

BKZ  (/3  =  10) 

2.26 

5.5 

41 

280 

CVP 

70 

2 

BKZ  (/3  =  15) 

4.46 

29.5 

15 

300 

CVP 

75 

2 

BKZ  (/3  =  20) 

13.54 

53.0 

26 

320 

SVP 

80 

2 

BKZ  (/3  =  20) 

6.67 

22.5 

29 

340 

SVP 

85 

2 

BKZ  (/3  =  20) 

9.15 

37.0 

24 

360 

SVP 

90 

2 

BKZ  (/3  =  15) 

6.24 

23.5 

26 

380 

SVP 

95 

2 

BKZ  (/3  =  15) 

6.82 

36.0 

19 

400 

SVP 

100 

2 

BKZ  (/3  =  15) 

7.22 

33.5 

21 

420 

SVP 

105 

2 

BKZ  (/3  =  15) 

7.74 

43.0 

18 

440 

SVP 

110 

2 

BKZ  (/3  =  15) 

8.16 

49.0 

16 

460 

SVP 

115 

2 

BKZ  (/3  =  15) 

8.32 

52.0 

16 

480 

CVP 

120 

2 

BKZ  (/3  =  10) 

11.55 

87.0 

13 

500 

CVP 

125 

2 

BKZ  (/3  =  10) 

10.74 

93.5 

12 

520 

CVP 

130 

2 

BKZ  (/3  =  10) 

10.50 

96.0 

11 

540 

SVP 

135 

2 

BKZ  (/3  =  10) 

7.44 

55.0 

13 

Table  1.  Combined  Results.  The  best  lattice  parameter  choice  for  each  number  of  signatures  obtained  (in  steps  of  20) 


new  key  for  each  transaction  [30].  This  recommendation,  however,  is  not  always  followed  [36],  exposing  users  to  the 
attack. 

Another  option  to  reduce  the  effectiveness  of  the  Flush+Reload  part  of  the  attack  would  be  to  exploit  the  in¬ 
herent  properties  of  this  “Koblitz”  curve  within  the  OpenSSL  implementation;  which  would  also  have  the  positive 
side  result  of  speeding  up  the  scalar  multiplication  operation.  The  use  of  the  GLV  method  [19]  for  point  multiplica¬ 
tion  would  not  completely  thwart  the  above  attack,  but,  in  theory,  reduces  its  effectiveness.  The  GLV  method  is  used 
to  speed  up  the  computation  of  point  scalar  multiplication  when  the  elliptic  curve  has  an  efficiently  computable  en¬ 
domorphism.  This  partial  solution  is  only  applicable  to  elliptic  curves  with  easily  computable  automorphisms  with 
sufficiently  large  automorphism  group;  such  as  the  curve  secp256kl  which  we  used  in  our  example. 

The  curve  secp256kl  is  defined  over  a  prime  field  of  characteristic  p  with  p  =  \  mod  6.  This  means  that  Fp 
contains  a  primitive  6th  root  of  unity  and  if  (x,y)  is  in  the  group  of  points  on  E,  then  (— i^x,y)  is  also.  In  fact, 
=  [A](x,y)  for  some  A®  =  1  modn.  Since  the  computation  of  {—^x,y)  from  {x,y)  costs  only  one  finite 
field  multiplication  (far  less  than  computing  [A](x,y))  this  can  be  used  to  speed  up  scalar  multiplication;  instead  of 
computing  [k]G,  one  computes  [kojG-l-  [ki]([A]G)  where  kQ,ki  are  around  the  size  of  k^!'^.  This  is  known  to  be  one  of 
the  fastest  methods  of  performing  scalar  multiplication  [19].  The  computation  of  [kojG-l-  [ki]([A]G)  is  not  done  using 
two  scalar  multiplications  then  a  point  addition,  but  uses  the  so  called  Straus-Shamir  trick  which  used  joint  double  and 
add  operations  [19,  Alg  1]  performing  the  two  scalar  multiplications  and  the  addition  simultaneously. 

The  GLV  method  alone  would  be  vulnerable  to  simple  side-channel  analysis.  It  is  necessary  to  re-code  the  scalars 
ko  and  ki  and  comb  method  as  developed  and  assembled  in  [17]  so  that  the  execution  is  regular  to  thwart  simple  power 
analysis  and  timing  attacks.  Using  the  attack  presented  above  we  are  able  to  recover  around  2  bits  of  the  secret  key  for 
each  signature  monitored.  If  the  GLV  method  were  used  in  conjunction  with  wNAF,  the  number  of  bits  (on  average) 
leaked  per  signature  would  be  reduced  to  4/3.  It  is  also  possible  to  extend  the  GLV  method  to  representations  of  k  in 

terms  of  higher  degrees  of  X,  for  example  writing  k  =  ko  +  kiA  H - \-ktX*  mod  n.  For  f  =  2  the  estimated  rate  of  bit 

leakage  would  be  6/7  bits  per  signature  (though  this  extension  is  not  possible  for  the  example  curve  due  to  the  order 
of  the  automorphism). 

We  see  that  using  the  GLV  method  can  reduce  the  number  of  leaked  bits  but  it  is  not  sufficient  to  prevent  the  attack. 
A  positive  flip  side  of  this  and  the  attack  of  [42]  is  that  implementing  algorithms  which  will  improve  the  efficiency  of 
the  scalar  multiplication  seem,  at  present,  to  reduce  the  effectiveness  of  the  attacks. 
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Scalar  blinding  techniques  [10,  27]  use  arithmetic  operations  on  the  scalar  to  hide  the  value  of  the  scalar  from 
potential  attackers.  The  method  suggested  by  these  works  is  to  compute  [(k  +  m-  •  ■n  +  m)]G—  [m]G  where  m  and  m  are 
small  (e.g.  32  bits)  numbers.  The  random  values  used  mask  the  bits  of  the  scalar  and  prevent  the  spy  from  recovering 
the  scalar  from  the  leaked  data. 

The  information  leak  in  our  attack  originates  from  using  the  sliding  window  in  the  wNAF  algorithm  for  scalar 
multiplication.  Hence,  an  immediate  fix  for  the  problem  is  to  use  a  fixed  window  algorithm  for  scalar  multiplication. 
A  naive  implementation  of  a  fixed  window  algorithm  may  still  be  vulnerable  to  the  Prime+Probe  attack,  e.g.  by 
adapting  the  technique  of  [35].  To  provide  protection  against  the  attack,  the  implementation  must  prevent  any  data  flow 
from  sensitive  key  data  to  memory  access  patterns.  Methods  for  achieving  this  are  used  in  NaCL  [8],  which  ensures 
that  the  sequence  of  memory  accesses  it  performs  is  not  dependent  on  the  private  key.  A  similar  solution  is  available 
in  the  implementation  of  modular  exponentiation  in  OpenSSL,  where  the  implementation  attempts  to  access  the  same 
sequence  of  memory  lines  irrespective  of  the  private  key.  However,  this  approach  may  leak  information  [7, 39]. 
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Fig.  1.  SVP  Experiments:  d  vs  Time/Prob  for  various  j3  and  Z  =  minz,-  =  0 
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Fig.  2.  SVP  Experiments:  d  vs  Time/Prob  for  various  /3  and  Z  =  minz,-  =  1 
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Fig.  3.  SVP  Experiments:  d  vs  Time/Prob  for  various  j3  and  Z  =  minz,-  =  2 
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Fig.  4.  CVP  Experiments:  d  vs  Time/Prob  for  various  j3  and  Z  =  minz,-  =  0 
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Fig.  5.  CVP  Experiments:  d  vs  Time/Prob  for  various  j6  and  Z  =  minz,-  =  1 
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Fig.  6.  CVP  Experiments:  d  vs  Time/Prob  for  various  j6  and  Z  =  minz,-  =  2 
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Abstract.  We  extend  the  Flush+Reload  side-channel  attack  of  Benger  et  al.  to  extract  a  significantly  larger 
number  of  bits  of  information  per  observed  signature  when  using  OpenSSL.  This  means  that  by  observing  only 
25  signatures,  we  can  recover  secret  keys  of  the  secp256kl  curve,  used  in  the  Bitcoin  protocol,  with  a  probability 
greater  than  50  percent.  This  is  an  order  of  magnitude  improvement  over  the  previously  best  known  result. 

The  new  method  of  attack  exploits  two  points:  Unlike  previous  partial  disclosure  attacks  we  utilize  all  information 
obtained  and  not  just  that  in  the  least  significant  or  most  significant  bits,  this  is  enabled  by  a  property  of  the  “stan¬ 
dard”  curves  choice  of  group  order  which  enables  extra  bits  of  information  to  be  extracted.  Furthermore,  whereas 
previous  works  require  direct  information  on  ephemeral  key  bits,  our  attack  utilizes  the  indirect  information  from 
the  wNAF  double  and  add  chain. 


1  Introduction 

The  Elliptic  Curve  Digital  Signature  Algorithm  (ECDSA)  is  the  elliptic  curve  analogue  of  the  Digital  Signature  Algo¬ 
rithm  (DSA).  It  has  been  well  known  for  over  a  decade  that  the  randomization  used  within  the  DSA/ECDSA  algorithm 
makes  it  susceptible  to  side-channel  attacks.  In  particular  a  small  leakage  of  information  on  the  ephemeral  secret  key 
utilized  in  each  signature  can  be  combined  over  a  number  of  signatures  to  obtain  the  entire  key. 

Howgrave-Graham  and  Smart  [14]  showed  that  DSA  is  vulnerable  to  such  partial  ephemeral  key  exposure  and 
their  work  was  made  rigorous  by  Nguyen  and  Shparlinski  [21],  who  also  extended  these  results  to  ECDSA  [22]. 
More  specifically,  if,  for  a  polynomially  bounded  number  of  random  messages  and  ephemeral  keys  about  log*/^^  least 
significant  bits  (LSBs)  are  known,  the  secret  key  a  can  be  recovered  in  polynomial  time.  A  similar  result  holds  for 
a  consecutive  sequence  of  the  most  significant  bits  (MSBs),  with  a  potential  need  for  an  additional  leaked  bit  due  to 
the  paucity  of  information  encoded  in  the  most  significant  bit  of  the  ephemeral  key.  When  an  arbitrary  sequence  of 
consecutive  bits  in  the  ephemeral  key  is  known,  about  twice  as  many  bits  are  required.  The  attack  works  by  constructing 
a  lattice  problem  from  the  obtained  digital  signatures  and  side-channel  information,  and  then  applying  lattice  reduction 
techniques  such  as  ELL  [16]  or  BKZ  [23]  to  solve  said  lattice  problem. 

Brumley  and  co-workers  employ  this  lattice  attack  to  recover  ECDSA  keys  using  leaked  LSBs  (in  [4])  and  leaked 
MSBs  (in  [5]).  The  former  uses  a  cache  side-channel  to  extract  the  leaked  information  and  the  latter  exploits  a  timing 
side-channel.  In  both  attacks,  a  fixed  number  of  bits  from  each  signature  is  used  and  signatures  are  used  only  if  the 
values  of  these  bits  are  all  zero.  Signatures  in  which  the  value  of  any  of  these  bits  are  one  are  ignored.  Consequently, 
both  attacks  require  more  than  2,500  signatures  to  break  a  160-bit  private  key. 

More  recently,  again  using  a  cache  based  side-channel,  Benger  et  al.  [2]  use  the  LSBs  of  the  ephemeral  key  for  a 
wNAE  (a.k.a.  sliding  window  algorithm)  multiplication  technique.  By  combining  a  new  side-channel  called  the  Elu- 
SHH-Reload  side-channel  [26,27],  and  a  more  precise  lattice  attack  strategy,  which  utilizes  all  of  the  leaked  LSBs 
from  every  signature,  Benger  et  al.  are  able  to  significantly  reduces  the  number  of  signatures  required.  In  particular 
they  report  that  the  full  secret  key  of  a  256-bit  system  can  be  recovered  with  about  200  signatures  in  a  reasonable 
length  of  time,  and  with  a  reasonable  probability  of  success. 

In  this  work  we  extend  the  Elushh-Reload  technique  of  Benger  et  al.  to  reduce  the  number  of  required  signatures 
by  an  order  of  magnitude.  Our  methodology  abandons  the  concentration  on  extraction  of  bits  in  just  the  MSB  and  LSB 
positions,  and  instead  focuses  on  all  the  information  leaked  by  all  the  bits  of  the  ephemeral  key.  In  particular  we 
exploit  a  property  of  many  of  the  standardized  elliptic  curves  as  used  in  OpenSSL.  Our  method,  just  as  in  [2],  applies 
the  Elushh-Reload  side-channel  technique  to  the  wNAE  elliptic  curve  point  multiplication  algorithm  in  OpenSSL. 
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ECDSA  Using  Standard  Elliptic  Curves:  The  domain  parameters  for  ECDSA  are  an  elliptic  curve  E  over  a  field  F, 
and  a  point  G  on  E,  of  order  q.  Given  a  hash  function  h,  the  ECDSA  signature  of  a  message  m,  with  a  private  key 
Q  <  a  <q  and  public  key  Q  —  aG,  is  computed  by: 

-  Selecting  a  random  ephemeral  key  0  <  k  <  q 

-  Computing  r  =  x{kG)  (mod  q),  the  X  coordinate  of  kG. 

-  Computing  s  =  k^^{h{m)  +  a  ■  r)  (mod  q). 

The  process  is  repeated  if  either  r  =  0  or  s  =  0.  The  pair  (r,s)  is  the  signature. 

To  increase  interoperability,  standard  bodies  have  published  several  sets  of  domain  parameters  for  ECDSA  [1,7, 
20].  The  choice  of  moduli  for  the  fields  used  in  these  standard  curves  is  partly  motivated  by  efficiency  arguments.  Eor 
example,  all  of  the  moduli  in  the  curves  recommended  by  EIPS  [20]  are  generalised  Mersenne  primes  [24]  and  many 
of  them  are  pseudo-Mersenne  primes  [10].  This  choice  of  moduli  facilitates  efficient  modular  arithmetic  by  avoiding 
a  division  operation  which  may  otherwise  be  required. 

A  consequence  of  using  pseudo-Mersenne  primes  as  moduli  is  that,  due  to  Hasse’s  Theorem,  not  only  is  the  finite- 
field  order  close  to  a  power  of  two,  but  so  is  the  elliptic-curve  group  order. 

That  is,  q  can  be  expressed  as  2"  —  e,  where  |e|  <  2^  for  some  p  «  n/2.  We  demonstrate  that  such  curves  are  more 
susceptible  to  partial  disclosure  of  ephemeral  keys  than  was  hitherto  known.  This  property  increases  the  amount  of 
information  that  can  be  used  from  partial  disclosure  and  allows  for  a  more  effective  attack  on  ECDSA. 

Our  Contribution:  We  demonstrate  that  the  above  property  of  the  standardized  curves  allows  the  utilization  of  far 
more  leaked  information,  in  particular  some  arbitrary  sequences  of  consecutive  leaked  bits.  In  a  nutshell,  adding  or 
subtracting  q  to  or  from  an  unknown  number  is  unlikely  to  change  any  bits  in  positions  between  p  +  1  and  n.  Based 
on  this  observation  we  are  able  to  use  (for  wNAE  multiplication  algorithms)  all  the  information  in  consecutive  bit 
sequences  in  positions  above  p  -I-  1.  Since  in  many  of  the  standard  curves  p  «  n/2,  a  large  amount  of  information  is 
leaked  per  signature.  (Assuming  one  can  extract  the  sequence  of  additions  and  doubles  in  an  algorithm.)  As  identified 
by  Ciet  and  Joye  [8]  and  exploited  by  Eeix  et  al.  [11],  the  same  property  also  implies  that  techniques  for  mitigating 
side-channel  attack,  such  as  the  scalar  blinding  suggested  in  [4, 1 8] ,  do  not  protect  bits  in  positions  above  p  -I-  1 . 

Prior  works  deal  with  the  case  of  partial  disclosure  of  consecutive  sequences  of  bits  of  the  ephemeral  key.  Our 
work  offers  two  improvements:  It  demonstrates  how  to  use  partial  information  leaked  from  the  double  and  add  chains 
of  the  wNAE  scalar  multiplication  algorithm  [13, 19].  In  most  cases,  the  double  and  add  chain  does  not  provide  direct 
information  on  the  value  of  bits.  It  only  identifies  sequences  of  repeating  bits  without  identifying  the  value  of  these 
bits.  We  show  how  to  use  this  information  to  construct  a  lattice  attack  on  the  private  key.  Secondly,  our  attack  does 
not  depend  on  the  leaked  bits  being  consecutive.  We  use  information  leaked  through  the  double  and  add  chain  even 
though  it  is  spread  out  along  the  ephemeral  key. 

By  using  more  leaked  information  and  exploiting  the  above  property  of  the  elliptic  curves,  our  attack  only  requires 
a  handful  of  leaked  signatures  to  fully  break  the  private  key.  Our  experiments  show  that  the  perfect  information  leaked 
on  double  and  add  chains  of  only  13  signatures  is  sufficient  for  recovering  the  256  bit  private  key  of  the  secp256kl 
curve  with  probability  greater  than  50  percent.  Eor  the  521  bit  curve  secp521rl,  40  signatures  are  required.  We  further 
demonstrate  that  for  the  secp256kl  case  observing  25  signatures  is  highly  likely  to  recover  13  perfect  double  and  add 
chains.  Hence,  by  observing  25  Bitcoin  transactions  using  the  same  key,  an  attacker  can  expect  to  recover  the  private 
key.  Eor  most  of  the  paper  we  discuss  the  case  of  perfect  side  channels  which  result  in  perfect  double  and  add  chains, 
then  in  Section  6  we  show  how  this  assumption  can  be  removed  in  the  context  of  a  real  Elushh-Reload  attack. 

2  Background 

In  this  section  we  discuss  three  basic  procedures  we  will  be  referring  to  throughout.  Namely  the  Elushh-Reload 
side-channel  attack  technique,  wNAE  scalar  multiplication  method  and  the  use  of  lattices  to  extract  secret  keys  from 
triples.  The  side-channel  information  we  obtain  from  executing  the  wNAE  algorithm  produces  instances  of  the  Hidden 
Number  Problem  (HNP)  [3].  Since  the  HNP  is  traditionally  studied  via  lattice  reduction  it  is  therefore  not  surprising 
that  we  are  led  to  lattice  reduction  in  our  analysis. 
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2.1  The  Flush+Reload  Side-Channel  Attack  Technique 

FlusH+Reload  is  a  recently  discovered  cache  side-channel  attack  [26,27].  The  attack  exploits  a  weakness  in  the 
Intel  implementation  of  the  popular  X86  architecture,  which  allows  a  spy  program  to  monitor  other  programs’  read  or 
execute  access  to  shared  regions  of  memory.  The  spy  program  only  requires  read  access  to  the  monitored  memory. 

Unlike  most  cache  side-channel  attacks,  FlusHH-Reload  uses  the  Last-Level  Cache  (LLC),  which  is  the  cache 
level  closest  to  the  memory.  The  LLC  is  shared  by  the  execution  cores  in  the  processor,  allowing  the  attack  to  oper¬ 
ate  when  the  spy  and  victim  processes  execute  on  different  cores.  Furthermore,  as  most  virtual  machine  hypervisors 
(VMMs)  actively  share  memory  between  co-resident  virtual  machines,  the  attack  is  applicable  to  virtualized  environ¬ 
ment  and  works  cross-VM. 


Input;  adrs — the  probed  address 

Output;  true  if  the  address  was  accessed  by  the  victim 
begin 

Qv\cl{adrs) 

wait_a_bit() 

time  current_time() 

tmp  ^  iesidiadrs) 

readTime  ^  current_time()-?/me 

return  readTime  <  threshold 

end 

Algorithm  1:  Flushh-Reload  Algorithm 


To  monitor  access  to  memory,  the  spy  repeatedly  evicts  the  contents  of  the  monitored  memory  from  the  LLC, 
waits  for  some  time  and  then  measures  the  time  to  read  the  contents  of  the  monitored  memory.  See  Algorithm  1  for  a 
pseudo-code  of  the  attack.  Flushh-Reload  uses  the  X86  clf  lush  instruction  to  evict  contents  from  the  cache.  To 
measure  time  the  spy  uses  the  rdtsc  instruction  which  returns  the  time  since  processor  reset  measured  in  processor 
cycles. 

As  reading  from  the  LLC  is  much  faster  than  reading  from  memory,  the  spy  can  differentiate  between  these  two 
cases.  If,  following  the  wait,  the  contents  of  memory  is  retrieved  from  the  cache,  it  indicates  that  another  process  has 
accessed  the  memory.  Thus,  by  measuring  the  time  to  read  the  contents  of  memory,  the  spy  can  decide  whether  the 
victim  has  accessed  the  monitored  memory  since  the  last  time  it  was  evicted. 

To  implement  the  attack,  the  spy  needs  to  share  the  monitored  memory  with  the  victim.  For  attacks  occurring 
within  the  same  machine,  the  spy  can  map  files  used  by  the  victim  into  its  own  address  space.  Examples  of  these  files 
include  the  victim  program  file,  shared  libraries  or  data  files  that  the  victim  accesses.  As  all  mapped  copies  of  files  are 
shared,  this  gives  the  spy  access  to  memory  pages  accessed  by  the  victim.  In  virtualized  environments,  the  spy  does 
not  have  access  to  the  victim’s  files.  The  spy  can,  however,  map  copies  of  the  victim  files  to  its  own  address  space,  and 
rely  on  the  VMM  to  merge  the  two  copies  using  page  de-duplication  [15, 25].  It  should  be  pointed  that,  as  the  LLC 
is  physically  tagged,  the  virtual  address  in  which  the  spy  maps  the  files  is  irrelevant  for  the  attack.  Hence,  FlusHH- 
Reload  is  oblivious  to  address  space  layout  randomization  [17]. 

This  sharing  only  works  when  the  victim  does  not  make  private  modifications  to  the  contents  of  the  shared  pages. 
Consequently,  many  Flushh-Reload  attacks  target  executable  code  pages,  monitoring  the  times  the  victim  executes 
specific  code.  The  spy  typically  divides  time  into  fixed  width  time  slots.  In  each  time  slot  the  spy  monitors  a  few 
memory  locations  and  records  the  times  that  these  locations  were  accessed  by  the  victim.  By  reconstructing  a  trace 
of  victim  access,  the  spy  is  able  to  infer  the  data  the  victim  is  operating  on.  Prior  works  used  this  attack  to  recover 
the  private  key  of  GnuPG  RSA  [27]  as  well  as  for  recovering  the  ephemeral  key  used  in  OpenSSL  ECDSA  signatures 
either  completely,  for  curves  over  binary  fields  [26],  or  partially,  for  curves  over  prime  fields  [2]. 

2.2  The  wNAF  Scalar  Multiplication  Method 

Several  algorithms  for  computing  the  scalar  multiplication  kG  have  been  proposed.  One  of  the  suggested  methods  is  to 
use  the  windowed  nonadjacent  form  (wNAE)  representation  of  the  scalar  k,  see  [13].  In  wNAE  a  number  is  represented 
by  a  sequence  of  digits  k,.  The  value  of  a  digit  k,  is  either  zero  or  an  odd  number  —2"  <  ki  <  2’^,  with  each  pair  of 
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non-zero  digits  separated  by  at  least  w  zero  digits.  The  value  of  k  can  be  calculated  from  its  wNAF  representation 
using  k  —  ■  ki.  See  Algorithm  2  for  a  method  to  convert  a  scalar  k  into  its  wNAF  representation.  We  use  |  •  |a:  to 

denote  the  reduction  modulo  x  into  the  range  [— x/2, . . .  ,x/2). 


Input;  Scalar  k  and  window  width  w 
Output:  k  in  wNAF:  kQ,ki,k2 .  ■  ■ 
begin 

e  k 
f^O 

while  e  >  0  do 

if  e  mod  2  =  1  then 

I  ^  |^l2W’+l 
I  e  e—ki 

else 

I 

end 

e^ejl 

/<—/+! 

end 

end 

Algorithm  2:  Conversion  to  Non- Adjacent  Form 


Let  ki  be  the  value  of  the  variable  e  at  the  start  of  the  i*  iteration  in  Algorithm  2.  From  the  algorithm,  it  is  clear 
that  _ 

_  J  0  is  even 

'  \  |^,j2»’+i  h  is  odd 

Furthermore: 

k  =  2‘  ■ki  +  Y.^j -kj  (2) 

.i<i 

Let  m  and  m  -f  1  be  the  position  of  two  consecutive  non-zero  wNAF  digits,  i.e.  km,km+i  ^  0  and  k^+i  =  0  for  all 
0  <  i  <  1.  We  now  have 

-2'"+’^<  ^/tr2'<2“+”',  (3) 

i<m 

and  because  I  >  w,  we  get  —2™+'  '  <  Y  i<m+i-\  ki  ■  2'  <  2™+^  * .  Substituting  m  for  m  H- 1  gives 

-2“^'<  Y  (4) 

i<m—\ 

We  note  that  for  the  minimal  m  such  that  7^  0  we  have  Yi<m-i  ki  •  2'  =  0.  Hence  (4)  holds  for  every  m  such  that 

km  ^  0. 

Because  km  is  odd,  we  have  —(2"'  —  1)  <  km  <22^  —  \ .  Adding  km  ■  2™  to  (4)  gives  a  slightly  stronger  version  of  (3): 

_  _  2"-- ' )  <  ^  /t,- .  2'  <  2“+"'  -  2“- '  (5) 

i<m 

One  consequence  of  subtracting  negative  wNAF  components  is  that  the  wNAF  representation  may  be  one  digit  longer 
than  the  binary  representation  of  the  number.  For  n-digits  binary  numbers  Moller  [19]  suggests  using  ki  ^  when 
i  =  n  —  w—l  and  e  is  odd,  where  [-lx  denotes  the  reduction  modulo  x  into  the  interval  [0, . . . ,  x) .  This  avoids  extending 
the  wNAF  representation  in  half  the  cases  at  the  cost  of  weakening  the  non-adjacency  property  of  the  representation. 

2.3  Lattice  background 

Before  we  describe  how  to  get  the  necessary  information  from  the  side-channel  attack,  we  recall  from  previous  works 
what  kind  of  information  we  are  looking  for.  As  in  previous  works  [2, 4, 5, 14, 21, 22],  the  side-channel  information 
is  used  to  construct  a  lattice  basis  and  the  secret  key  is  then  retrieved  by  solving  a  lattice  problem  on  this  lattice. 
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Generally,  for  a  private  key  a  and  a  group  order  q,  in  previous  works  the  authors  somehow  derive  triples  {ti,Ui,Zi) 
from  the  side-channel  information  such  that 

-  <_vi=\a-ti-  Ui\g  <  q/2^'+'^ .  (6) 

Note  that  for  arbitrary  a  and  f,,  the  values  of  v,  are  uniformly  distributed  over  the  interval  [— ^/2,^/2).  Hence,  each 
such  triple  provides  about  z,  bits  of  information  about  a.  The  use  of  a  different  Zi  per  equation  was  introduced  in  [2]. 
If  we  take  d  such  triples  we  can  construct  the  following  lattice  basis 

/2^^  +  ^-q  \ 

B  = 

2"rf+'  •  q 

V2^i+i.fl  ...  2^‘^+I.G  1/ 

whose  rows  generate  the  lattice  that  we  use  to  retrieve  the  secret  key.  Now  consider  the  vector  u  =  (2^*+'  •  mi,  . . .  ,2^*^+^  • 
M^,0),  which  consists  of  known  quantities.  Equation  (6)  implies  the  existence  of  integers  (Ai, . . .  ,Arf)  such  that  for  the 
vectors  x  =  (Ai , . . . ,  a)  and  y  =  (2^*+^  •  vi , . . .  •  v^,  (x)  we  have 


X  B  —  u  =  y. 

Again  using  Equation  (6),  we  see  that  the  2-norm  of  the  vector  y  is  at  most  d-  (f-  +  a?  «  \/d+  \  ■  q.  Because  the 
lattice  determinant  of  L{B)  is  2'^+^^'  •  q‘^,  the  lattice  vector  x  •  B  is  heuristically  the  closest  lattice  vector  to  u.  By  solving 
the  Closest  Vector  Problem  (CVP)  on  input  of  the  basis  B  and  the  target  vector  u,  we  obtain  x  and  hence  the  secret  key 
a. 

There  are  two  important  methods  of  solving  the  closest  vector  problem:  using  an  exact  CVP-solver  or  using  the 
heuristic  embedding  technique  to  convert  it  to  a  Shortest  Vector  Problem  (SVP).  Exact  CVP-solvers  require  exponen¬ 
tial  time  in  the  lattice  rank  {d+\  in  our  case),  whereas  the  SVP  instance  that  follows  from  the  embedding  technique 
can  sometimes  be  solved  using  approximation  methods  that  run  in  polynomial  time.  Because  the  ranks  of  the  lattices 
in  this  work  become  quite  high  when  attacking  a  521  bit  key,  we  mostly  focus  on  using  the  embedding  technique  and 
solving  the  associated  SVP  instance  in  this  case. 

The  embedding  technique  transforms  the  previously  described  basis  B  and  target  vector  u  to  a  new  basis  B', 
resulting  in  a  new  lattice  of  dimension  one  higher  than  that  generated  by  B: 


B'  = 


Eollowing  the  same  reasoning  as  above,  we  can  set  x'  =  (x,  —  1)  and  obtain  the  lattice  vector  y'  =  x'  -B'  =  (y,  —q).  The 
2-norm  of  y'  is  upper  bounded  by  approximately  ^/d  +  2  ■  q,  whereas  this  lattice  has  determinant  2'^+^^'  •  .  Note, 

however,  that  this  lattice  also  contains  the  vector 


(-fi,...,-G,^,0)  B'  =  (0,..., 0,^,0) 


which  will  most  likely  be  the  shortest  vector  of  the  lattice.  Still,  our  approximation  algorithms  for  SVP  work  on  bases 
and  it  is  obvious  to  see  that  any  basis  of  the  same  lattice  must  contain  a  vector  ending  in  ±q.  Thus,  it  is  heuristically 
likely  that  the  resulting  basis  contains  the  short  vector  y',  which  reveals  a. 

To  summarize,  we  turn  the  side-channel  information  into  a  lattice  and  claim  that,  heuristically,  finding  the  secret 
key  is  equivalent  to  solving  a  CVP  instance.  Then,  we  claim  that,  again  heuristically,  solving  this  CVP  instance 
is  equivalent  to  solving  an  SVP  instance  using  the  embedding  technique.  In  Section  5  we  will  apply  the  attack  to 
simulated  data  to  see  whether  these  heuristics  hold  up. 


3  Using  the  wNAF  Information 

Assuming  we  have  a  side  channel  that  leaks  the  double  and  add  chain  of  the  scalar  multiplication.  We  know  how  to 
use  the  leaked  LSBs  [2].  These  leaked  LSBs  carry,  on  average,  two  bits  of  information. 
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Given  a  double  and  add  chain,  the  positions  of  the  add  operations  in  the  chain  correspond  to  the  non-zero  digits  in 
the  wNAF  representation  of  the  ephemeral  key  k.  Roughly  speaking,  in  half  the  cases  the  distance  between  consecutive 
non-zero  digits  is  w-f  1.  In  a  quarter  of  the  cases  it  is  w -1-2  and  so  on.  Hence,  the  average  distance  between  consecutive 
non-zero  digits  is  =  w-|-2.  Since  there  are  2“'  non-zero  digits,  we  expect  that  the  double  and  add  chain 

carries  two  bits  of  information  per  each  non-zero  digit  position. 

Reducing  this  information  to  an  instance  of  the  HNP  presents  three  challenges: 

-  The  information  is  not  consecutive,  but  is  spread  along  the  scalar. 

-  Due  to  the  use  of  negative  digits  in  the  wNAF  representation,  the  double  and  add  chain  does  not  provide  direct 

information  on  the  bits  of  the  scalar 

-  Current  techniques  lose  half  the  information  when  the  information  is  not  at  the  beginning  or  end  of  the  scalar. 

As  described  in  [2],  the  OpenSSL  implementation  departs  slightly  from  the  descriptions  of  ECDSA  in  Section  1. 
As  a  countermeasure  to  the  Brumley  and  Tuveri  remote  timing  attack  [5],  OpenSSL  adds  ^  or  2  •  ^  to  the  randomly 
chosen  ephemeral  key,  ensuring  that  k  is  n  -I-  1  bits  long.  While  the  attack  is  only  applicable  to  curves  defined  over 
binary  fields,  the  countermeasure  is  applied  to  all  curves.  Consequently,  our  analysis  assumes  that  2"  <k<2"+'. 

To  handle  non-consecutive  information,  we  extract  a  separate  HNP  instance  for  each  consecutive  set  of  bits,  and 
use  these  in  the  lattice.  The  effect  this  has  on  the  lattice  attack  is  discussed  in  Section  4. 

To  handle  the  indirect  information  caused  by  the  negative  digits  in  the  wNAF  representation  we  find  a  linear 
combination  of  k  in  which  we  know  the  values  of  some  consecutive  bits,  we  can  use  that  to  build  an  HNP  instance. 

Let  m  and  m-f  1  be  the  positions  of  two  consecutive  non-zero  wNAL  digits  where  m  +  l  <n.  Lrom  the  definition 
of  the  wNAL  representation  we  know  that  k  =  A:m+;2'”+^  -|-^,<„A:,2'.  We  can  now  define  the  following  values: 


^m-Vl  1 
““  2 

c  =  ^kr2'-f2'"+”' 

i<m 


By  (5)  we  have 

2“^  *  <  c  <  2"’+'^+ 1  -  2“^  ^  (7) 

Lrom  (2)  we  have 

k  -  2™+'  -p  2”’+'"  =  a  ■  2"'+'+*  -f  c 

where  0  <  a  <  2"^'”^^  and  because  I  >w+l  there  are  Z  —  w  consecutive  zero  bits  in  k  —  2"'+^  -I-  2'"+*^. 

In  order  to  extract  this  information,  we  rely  on  a  property  of  the  curve  where  the  group  order  q  is  close  to  a  power 
of  two.  More  precisely,  q  =  2'‘  —  e  where  |e|  <  2'’  for  p  «  n/2.  We  note  that  many  of  the  standard  curves  have  this 
property. 

Let  K  =  A-2'^  +  C,  with  0  <  A  <  2^*  and  2'’+^'  <C  <  2^* +^2  _  2P+^i  ,  note  that  this  implies  L2  >  p.  Because 
q  =  2"  —  e  we  get  K  —  A-  q  =  K  — A  - 2" +  A-  e  =  C+A-e.  Now,  |e|  <  2^.  Consequently,  0  <K  —  Aq  <  2^* +^2  and 
we  get  l/f  — 2^i+^2-i|  <  2^'+^2-i^  Por  p-fl  <  m  <  n  —  I  we  can  set 

I  I  q 


L\  =  n  —  m  —  l 
L2  =  m  +  w 

C  =  c-2"^'”^'^*  =^.2^1-1 

K={k-  2'”+'  -f  2'”+’^) .  2"-'"-'-  ^  =  (k  -  2"*+'  -f  2“+”')  •  2^'  - '  =  fl  •  2"  -f  C 

Lrom  (7)  we  obtain  2^1+”"^^  <  C  <  2^'+^2  _2^i+™-2  y^j^ich,  because  m>p  —  2,  becomes  2'’+^'  <  C  <  2^‘+^2  _2P+^i  . 
Thus,  we  have 
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Noting  that  A:  =  a  •  r  •  s  *+/z-s  ^  (mod  ^),  we  can  define  the  values 

f= 

V  =  \a  — 

|v|  <  which  gives  us  an  instance  of  the  HNP  which  carries  I  —  w  bits  of  information. 

4  Heuristic  Analysis 


Now  we  know  how  to  derive  our  triples  f,,  m,  and  z,  that  are  used  to  construct  the  lattice.  The  next  obvious  question  is: 
How  many  do  we  need  before  we  can  retrieve  the  private  key  a?  Because  the  lattice  attack  relies  on  several  heuristics, 
it  is  hard  to  give  a  definitive  analysis.  However,  we  will  give  heuristic  reasons  here,  similar  to  those  for  past  results. 

Each  triple  (f,, z,)  gives  us  z,  bits  of  information.  If  this  triple  comes  from  a  pair  (m,  1)  such  that  p+1  <m  <n  —  l, 
then  Zi  =  I  —  w.  In  Section  3  we  know  that  on  average  I  =  w  +  2.  Since  the  positions  of  the  non-zero  digits  are 
independent  of  p,  on  average  we  lose  half  the  distance  between  non-zero  digits,  or  {w  +  2)/2  bits,  before  the  first 
usable  triple  and  after  the  last  usable  triple,  which  leaves  us  with  n  — \  —  {p  +  2)  —  {w +  2)  bits  where  our  triples  can 
be.  The  average  number  of  triples  is  now  given  by  (n  —  p  —  3  —  (w  -f  2))/(w  +  2)  and  each  of  these  triples  gives  us 
I  —w  =  2  bits  on  average.  Combining  this  yields  2-(n  —  p  —  3  —  (w  +  2))/(w  +  2)  =2  -  {n—  p  —  3) /{w  -f  2)  —  2  bits 
per  signature.  For  the  secp256kl  curve  we  have  that  n  =  256,  p  =  129  and  w  =  3,  leading  to  47.6  bits  per  signature 
on  average.  Our  data  obtained  from  perfect  side-channels  associated  to  1001  signatures  gives  us  an  average  of  47.6 
with  a  95%  confidence  interval  of  ±0.2664.  For  the  secp521rl  curve,  we  have  that  n  =  521,  p  =  259  and  w  =  4, 
which  suggests  84.33  bits  per  signature  on  average.  The  data  average  here  is  84.1658  with  a  95%  confidence  interval 
of  ±0.3825.  See  also  the  Z—\  cases  of  Figures  1  and  2,  which  show  the  distribution  of  the  bits  leaked  per  signature 
in  the  256-bit  and  521 -bit  cases,  respectively. 

This  formula  suggests  that  on  average,  six  signatures  would  be  enough  to  break  a  256-bit  key  (assuming  a  perfect 
side  channel),  since  47.6  •  6  =  285.6  >  256.  However,  in  our  preliminary  experiments  the  attack  did  not  succeed  once 
when  using  six  or  even  seven  signatures.  Even  eight  or  nine  signatures  gave  a  minimal  success  probability.  This 
indicates  that  something  is  wrong  with  the  heuristic.  In  general  there  are  two  possible  reasons  for  failure.  Either  the 
lattice  problem  has  the  correct  solution  but  it  was  too  hard  to  solve,  or  the  solution  to  the  lattice  problem  does  not 
correspond  to  the  private  key  a.  We  will  now  examine  these  two  possibilities  and  how  to  deal  with  them. 


4.1  Hardness  of  the  lattice  problem 

Generally,  the  lattice  problem  becomes  easier  when  adding  more  information  to  the  lattice,  but  it  also  becomes  harder 
as  the  rank  increases.  Since  each  triple  adds  information  but  also  increases  the  rank  of  the  lattice,  it  is  not  always  clear 
whether  adding  more  triples  will  solve  the  problem  or  make  it  worse.  Each  triple  contributes  z;  bits  of  information, 
so  we  would  always  prefer  triples  with  a  higher  Zi  value.  Therefore,  we  set  a  bound  Z  >  1  and  only  keep  those  triples 
that  have  z/  >  Z.  However,  this  decreases  the  total  number  of  bits  of  information  we  obtain  per  signature.  If  Z  is  small 
enough,  then  roughly  speaking  we  only  keep  a  fraction  2^^^  of  the  triples,  but  now  each  triple  contributes  Z  ±  1  bits 
on  average.  Hence,  the  new  formula  of  bits  per  signature  becomes 

2i-^-(Z±l).((n-p-3)/(w±2)-l). 

Our  data  reflects  this  formula  as  well  as  can  be  seen  in  Figures  1  and  2  for  the  256-bit  and  the  521 -bit  cases,  respec¬ 
tively.  In  our  experiments  we  will  set  an  additional  bound  d  on  the  number  of  triples  we  use  in  total,  which  limits  the 
lattice  rank  to  t/±  1.  To  this  end,  we  sort  the  triples  by  z/  and  then  pick  the  first  d  triples  to  construct  the  lattice.  We 
adopt  this  approach  for  our  experiments  and  the  results  can  be  found  in  Section  5. 
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Fig.  1;  Number  of  signatures  against  bits  per  signa¬ 
ture  in  the  256  bit  case. 


Fig.  2:  Number  of  signatures  against  bits  per  signa¬ 
ture  in  the  521  bit  case. 


4.2  Incorrect  solutions 


The  analysis  of  Nguyen  and  Shparlinski  [22]  requires  that  the  f,  values  in  the  triples  are  taken  uniformly  and  inde¬ 
pendently  from  a  distribution  that  satisfies  some  conditions.  However,  it  is  easy  to  see  that  when  two  triples  are  taken 
from  the  same  signature,  the  values  for  the  f,  =  •2"^”’'^^'^^]^  and  tj  =  are  not  even 

independent,  as  they  differ  modq'  by  a  factor  that  is  a  power  of  2  less  than  2". 

Recall  from  Sections  2.3  and  3  how  the  triples  are  used  and  created,  respectively.  Consider  a  triple  {tij,Uij,Zij) 
corresponding  to  a  signature  {ri,Si,hi).  The  corresponding  =  |a  •  tij  —  M,y|^  satisfies 


|a  •  in  ■  -  2«+«’-6-i 

-f  (hi  -5^1+  _  2""^+') .  2"-'’b-6-i  1^ 


which  is  equivalent  to 


Vii  = 


where  p+l  <mj  <n  —  Ij  and  Zij  =  l  —  w.  Now  (a  •  r,  -f  hi)  -s-  *  =  k,-  mod  q  and  we  know  that  the  previous  statement 
holds  due  to  the  structure  of  k;,  specifically  due  to  its  bits  nij  -f  w, . . . ,  m ^  -f  Ij  —  1  repeating,  with  bit  mj  +  Ij  being 
different  than  the  preceding  bit.  But  the  map  r  i— >  {x  ■  n  +  hi)  ■  is  a  bijection  mod q,  and  hence  for  each  /  there  will 
be  many  numbers  X  such  that  for  all  j 


K{x)\ 


\iX-ri  +  hi)-s;X2"-'^J-h-i-2'‘-^\^ 


<q/2^‘j+^. 


Let  Si  =  {X  ;  |v,j(2£')|  <  ^/2~'t+'  for  all  j}.  If  we  now  have  that  there  exists  anX  G  such  that 

X2  +  ^(2^.7  .v,,(X))2  <  a2  +  £(2^,7  .v,y(a))2, 

‘J  hj 

then  it  is  very  unlikely  that  the  lattice  algorithm  will  find  a,  because  X  corresponds  to  a  better  solution  to  the  lattice 
problem.  Note  that  this  problem  arises  when  fewer  signatures  are  used,  because  this  leads  to  fewer  distinct  values  for 
(r,,s, ■,/!,)  and  hence  fewer  sets  5,  that  need  to  intersect.  This  suggests  that  increasing  the  number  of  signatures  could 
increase  the  success  probability. 

Assuming  that  the  Si  are  random,  we  want  to  determine  what  is  the  probability  that  their  intersection  is  non-empty. 
First  we  consider  the  size  of  the  5,.  Recall  that  Si  consists  of  all  X  mod  q  such  that  Vij{X)  has  ‘the  same  structure  as 
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ki\  This  means  that  for  each  triple  specified  by  nij  and  Ij,  the  bits  nij  +  w,...  ,mj  +  lj  —I  repeat,  and  bit  rtij  +  Ij  is  the 
opposite  of  the  preceding  bits.  There  are  approximately  numbers  mod^  that  have  this  structure.  Let  ft 

be  the  number  of  triples  of  signature  i  and  gij  =  (1^  —  w  +  1 )  be  the  number  of  bits  fixed  by  triple  j  of  signature  i.  Then, 
because  the  triples  do  not  overlap  and  because  v,;(.)  is  a  bijection,  we  have  that 

fi  fi 

log2(|5iD  =«-£(!“  Sij)  =n-fi+Y.  Sij- 
./=i  >=i 

Let  Si  =  \Si\  and  assume  that  the  5,  are  chosen  randomly  and  independently  from  all  the  subsets  of  integers  in  the  range 
[0, ...  ,A^  —  1]  (of  size  Si),  where  N  =  2".  Consider  the  following  probability 

p,  =  P(0G5i)  =Si/N, 

since  5,  is  randomly  chosen.  Now,  because  the  5,  are  also  chosen  independently,  we  have 

p('oens.Unf>.- 

Finally,  since  this  argument  holds  for  any  j  G  [0, ...  —  1],  we  can  apply  the  union  bound  to  obtain 

Ffaii  =  P  (^U  e  ^  <  Ep  (^0  e  =  ^ •  YlPi  (8) 

Recall  that  each  signature  has  /;  =  2*^^  •  ((n  —  p  —  3)/(w  +  2)  —  1)  triples  on  average  and  each  triple  contributes 
Z  +  1  bits  on  average,  which  means  gij  =  Z  +  2  on  average.  If  we  plug  in  the  numbers  n  =  256,  p  =  129,  w  =  3  and 
Z  =  3,  we  get  that  fi  «  6,  gij  =  5  and  hence  p,-  «  «  2^^^  if  we  assume  an  average  number  of  triples  and  bits 

in  each  signature.  This  in  turn  gives  us  an  upper  bound  of  pfaii  <  N/2^^'^.  If  A:  >  11,  this  upper  bound  is  less  than  one, 

so  this  clearly  suggests  that  from  about  eleven  signatures  and  up,  we  should  succeed  with  some  probability,  which  is 
indeed  the  case  from  our  experiments. 

Repeating  this  for  n  =  521,  p  =  259,  w  =  4  and  Z  =  4,  we  obtain  f  «  5,  gij  =  6  and  hence  p,-  «  «  2^^^. 

Consequently,  pfaii  <  N /2^^'^,  which  is  less  than  one  when  A:  >  21.  However,  in  our  experiments  we  require  at  least 
30  signatures  to  obtain  the  secret  key  with  some  probability.  Thus  the  above  analysis  is  only  approximate  as  the  secret 
key  length  increases. 

5  Results  With  a  Perfect  Side- Channel 

Subsection  2.3  outlined  our  (heuristic)  approach  to  obtain  the  secret  key  from  a  number  of  triples  (f,, using 
lattices  and  Section  3  outlined  how  to  generate  these  triples  from  the  side-channel  information.  In  this  section  we  will 
look  at  some  experimental  results  to  see  if  our  heuristic  assumptions  are  justified. 

As  per  Section  4,  we  used  the  following  approach  for  our  experiments.  First,  we  fix  a  number  of  signatures  s,  a 
lattice  rank  d  and  a  bound  Z.  We  then  take  s  signatures  at  random  from  our  data  set  and  derive  all  triples  such  that 
Zi  >  Z,  sorting  them  such  that  the  z,  are  in  descending  order.  If  we  have  more  than  d  triples,  we  only  take  the  first  d  to 
construct  the  lattice.  Finally  we  attempt  to  solve  the  lattice  problem  and  note  the  result.  All  executions  were  performed 
in  single  thread  on  an  Intel  Core  i7-3770S  CPU  running  at  3.10  GHz. 

When  solving  the  CVP  instances  there  are  three  possible  outcomes.  We  obtain  either  no  solution,  the  private  key  or 
a  wrong  solution.  No  solution  means  that  the  lattice  problem  was  too  hard  for  the  algorithm  and  constraints  we  used, 
but  spending  more  time  and  using  stronger  algorithms  might  still  solve  it.  When  a  ‘wrong’  solution  is  obtained,  this 
means  that  our  heuristics  failed:  the  solution  vector  was  not  unique,  in  the  sense  that  there  were  other  lattice  vectors 
within  the  expected  distance  from  our  target  vector. 

When  solving  the  SVP  instance  there  are  only  two  outcomes.  Either  we  obtain  the  private  key  or  not.  However, 
in  this  case  it  is  not  as  clear  whether  a  wrong  solution  means  that  there  were  other  solutions  due  to  the  additional 
heuristics  involved.  The  complete  details  of  our  experimental  data  are  given  in  the  Appendix. 
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5.1  256  bit  key 


For  the  256  bit  case,  we  used  BKZ  with  block  size  20  from  fplll  [6]  to  solve  the  SVP  instances,  as  well  as  to  pre- 
process  the  CVP  instances.  To  solve  the  CVP,  we  applied  Schnorr-Euchner  enumeration  [23]  using  linear  pruning  [12] 
and  limiting  the  number  of  enumerated  nodes  to  2^®. 

The  CVP  approach  seems  the  best,  as  the  lattice  rank  {d  +  1)  remains  quite  small.  We  restrict  our  triples  to  Z  =  3 
to  keep  the  rank  small,  but  a  smaller  Z  would  not  improve  our  results  much.  See  the  appendix  for  details.  We  observed 
that  failures  are  mostly  caused  by  ‘wrong’  solutions  in  this  case,  rather  than  the  lattice  problem  being  too  hard.  In 
all  cases  we  found  that  using  75  triples  gave  the  best  results.  Table  2  in  the  Appendix  lists  the  runtimes  and  success 
probabilities  of  the  lattice  part  of  the  attack  for  varying  s.  The  results  are  graphically  presented  in  Figures  4  and  5  in 
the  Appendix. 

5.2  521  bit  key 

For  the  521  bit  case,  we  used  BKZ  with  block  size  20  from  fplll  [6]  to  solve  the  SVP  instances.  Due  to  the  higher 
lattice  ranks  in  this  case,  solving  the  CVP  instances  proved  much  less  efficient,  even  when  restricting  the  triples  to 
Z  =  4. 

With  30  signatures  we  get  a  small  probability  of  success  in  the  lattice  attack  whereas  with  40  signatures  we  can 
obtain  the  secret  key  in  more  than  half  of  the  cases.  It  should  be  noted  that  as  the  number  of  signatures  increases,  the 
choice  of  d  becomes  less  important,  because  the  number  of  triples  with  more  information  increases.  See  the  Appendix 
for  Table  4  details  and  Figures  6  and  7  for  a  graphical  representation. 

6  Results  in  a  Real-Life  Attack 

So  far  our  discussion  was  based  on  the  assumption  of  a  perfect  side-channel.  That  is,  we  assumed  that  the  double-and- 
add  chains  are  recovered  without  any  errors.  Perfect  side-channels  are,  however,  very  rare.  In  this  section  we  extend 
the  results  to  the  actual  side-channel  exposed  by  the  Flushh-Reload  technique. 

The  attack  was  carried  on  an  HP  Elite  8300,  running  CentOS  6.5.  The  victim  process  runs  OpenSSF  I.O.lf, 
compiled  to  include  debugging  symbols.  These  symbols  are  not  used  at  run-time  and  do  not  affect  the  performance  of 
OpenSSF.  We  use  them  because  they  assist  us  in  finding  the  addresses  to  probe  by  avoiding  reverse  engineering  [9]. 
The  spy  uses  a  time  slot  of  1,200  cycles  (0.375/rs).  In  each  time  slot  it  probes  the  memory  lines  containing  the 
last  field  multiplication  within  the  group  add  and  double  functions.  (ec_GFp_simple_add  and  ec_GFp_simple_dbl, 
respectively.)  Memory  lines  that  contain  function  calls  are  accessed  both  before  and  after  the  call,  reducing  the  chance 
of  a  spy  missing  the  access  due  to  overlap  with  the  probe.  Monitoring  code  close  to  the  end  of  the  function  eliminates 
false  positives  due  to  speculative  execution.  See  Yarom  and  Falkner  [27]  for  a  discussion  of  overlaps  and  speculative 
execution. 
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Fig.  3:  Flushh-Reload  spy  output.  Vertical  bars  indicate  time-slot  boundaries;  ‘A’  and  ‘D’  are  probes  for  OpenSSF 
access  to  add  and  double;  dashes  indicate  missed  time-slots. 


Figure  3  shows  an  example  of  the  output  of  the  spy  when  OpenSSF  signs  using  secp256kl.  The  double  and  three 
addition  operations  at  the  beginning  of  the  captured  sequence  are  the  calculation  of  the  pre-computed  wNAF  digits. 
Note  the  repeated  capture  of  the  double  and  add  operations  due  to  monitoring  a  memory  line  that  contains  a  function 
call.  The  actual  wNAF  multiplication  starts  closer  to  the  end  of  the  line,  with  7  double  operations  followed  by  a  group 
addition. 
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In  this  example,  the  attack  captures  most  of  the  double  and  add  chain.  It  does,  however,  miss  a  few  time-slots 
and  consequently  a  few  group  operations  in  the  chain.  The  spy  recognises  missed  time-slots  by  noting  inexplicable 
gaps  in  the  processor  cycle  counter.  As  we  do  not  know  which  operations  are  missed,  we  lose  the  bit  positions  of  the 
operations  that  precede  the  missed  time-slots.  We  believe  that  the  missed  time-slots  are  due  to  system  activity  which 
suspends  the  spy. 

Occasionally  OpenSSL  suspends  the  calculation  of  the  scalar  multiplication  to  perform  memory  management 
functions.  These  suspends  confuse  our  spy  program,  which  assumes  that  the  scalar  multiplication  terminated.  This,  in 
turn,  results  in  a  short  capture,  which  cannot  be  used  for  the  lattice  attack. 

To  test  prevalence  of  capture  errors  we  captured  1,000  scalar  multiplications  and  compared  the  capture  results  to 
the  ground  truth.  342  of  these  captures  contained  missed  time-slots.  Another  77  captures  contains  less  than  250  group 
operations  and  are,  therefore,  too  short.  Of  the  remaining  58 1  captures,  577  are  perfect  while  only  four  contain  errors 
that  we  could  not  easily  filter  out. 

Recall,  from  Section  5,  that  13  perfectly  captured  signatures  are  sufficient  for  breaking  the  key  of  a  256  bits  curve 
with  over  50%  probability.  An  attacker  using  FlusHH-Reload  to  capture  25  signatures  can  thus  expect  to  be  able  to 
filter  out  1 1  that  contain  obvious  errors,  leaving  14  that  contain  no  obvious  errors.  With  less  than  1%  probability  that 
each  of  these  14  captures  contains  an  error,  the  probability  that  more  than  one  of  these  captures  contains  an  error  is 
also  less  than  1%.  Hence,  the  attacker  only  needs  to  test  all  the  combination  of  choosing  13  captures  out  of  these  14  to 
achieve  a  50%  probability  of  breaking  the  signing  key. 

Several  optimisations  can  be  used  to  improve  the  figure  of  25  signatures.  Some  missed  slots  can  be  recovered  and 
the  spy  can  be  improved  to  correct  short  captures.  Nevertheless,  it  should  be  noted  that  this  figure  is  still  an  order  of 
magnitude  than  the  previously  best  known  result  of  200  signatures  [2],  where  200  signatures  correspond  to  a  3.5% 
probability  of  breaking  the  signing  key,  whereas  300  signatures  were  required  to  get  a  success  probability  greater  than 
50%. 


Acknowledgements 

The  authors  would  like  to  thank  Ben  Sach  for  helpful  conversations  during  the  course  of  this  work.  The  first  and  second 
authors  work  has  been  supported  in  part  by  ERC  Advanced  Grant  ERC-2010-AdG-267188-CRIPTO,  by  EPSRC 
via  grant  EP/I03126X,  and  by  Defense  Advanced  Research  Projects  Agency  (DARPA)  and  the  Air  Eorce  Research 
Laboratory  (AERL)  under  agreement  number  EA8750-1 1-2-0079^. 

The  third  author  wishes  to  thank  Dr  Katrina  Ealkner  for  her  advice  and  support  and  the  Defence  Science  and 
Technology  Organisation  (DSTO)  Maritime  Division,  Australia,  who  partially  funded  his  work. 


References 

1.  American  National  Standards  Institute.  ANSI  X9.62,  Public  Key  Cryptography  for  the  Financial  Services  Industry:  The  Elliptic 
Curve  Digital  Signature  Algorithm,  1999. 

2.  Naomi  Benger,  loop  van  de  Pol,  Nigel  P.  Smart,  and  Yuval  Yarom.  “Ooh  aah. . . ,  just  a  little  bit”:  A  small  amount  of  side 
channel  can  go  a  long  way.  In  Lejla  Batina  and  Matthew  Robshaw,  editors.  Proceedings  of  the  I6th  International  Workshop  on 
Cryptographic  Hardware  and  Embedded  Systems,  volume  8731  of  Lecture  Notes  in  Computer  Science,  pages  75-92,  Busan, 
Korea,  September  2014.  Springer. 

3.  Dan  Boneh  and  Ramarathnam  Venkatesan.  Hardness  of  computing  the  most  significant  bits  of  secret  keys  in  diffie-hellman 
and  related  schemes.  In  CRYPTO,  volume  1 109  of  Lecture  Notes  in  Computer  Science,  pages  129-142,  1996. 

4.  Billy  Bob  Brumley  and  Risto  M.  Hakala.  Cache-timing  template  attacks.  In  Mitsuru  Matsui,  editor.  Advances  in  Cryptology  - 
ASIACRYPT  2009,  volume  5912  of  Lecture  Notes  in  Computer  Science,  pages  667-684.  Springer- Verlag,  2009. 

5.  Billy  Bob  Brumley  and  Nicola  Tuveri.  Remote  timing  attacks  are  still  practical.  In  Vijay  Atluri  and  Claudia  Diaz,  editors. 
Computer  Security  -  ESORICS  2011,  pages  355-371,  Leuven,  Belgium,  September  2011. 

^  The  US  Government  is  authorized  to  reproduce  and  distribute  reprints  for  Government  purposes  notwithstanding  any  copyright 
notation  thereon.  The  views  and  conclusions  contained  herein  are  those  of  the  authors  and  should  not  be  interpreted  as  necessarily 
representing  the  official  policies  or  endorsements,  either  expressed  or  implied,  of  Defense  Advanced  Research  Projects  Agency 
(DARPA)  or  the  U.S.  Government. 


475 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


6.  David  Cade,  Xavier  Pujol,  and  Damien  Stehle.  FPLLL-4.0.4.  http://perso.ens-lyon.fr/damien.stehle/fplll/, 
2013. 

7.  Certicom  Research.  SEC  2:  Recommended  Elliptic  Curve  Domain  Parameters,  Version  2.0,  January  2010. 

8.  Mathieu  Ciet  and  Marc  Joye.  (virtually)  free  randomization  techniques  for  elliptic  curve  cryptography.  In  Sihan  Qing,  Dieter 
Gollmann,  and  Jianying  Zhou,  editors.  Proceedings  of  the  fifth  International  Conference  on  Information  and  Communications 
Security,  volume  2836  of  Lecture  Notes  in  Computer  Science,  pages  348-359,  Huhehaote,  China,  October  2003.  Springer. 

9.  Teodoro  Cipresso  and  Mark  Stamp.  Software  reverse  engineering.  In  Peter  Stavroulakis  and  Mark  Stamp,  editors.  Handbook 
of  Information  and  Communication  Security,  chapter  31,  pages  659-696.  Springer,  2010. 

10.  Richard  E.  Crandall.  Method  and  apparatus  for  public  key  exchange  in  a  cryptographic  system.  US  Patent  5,159,632,  October 
1992. 

11.  Benoit  Feix,  Mylene  Roussellet,  and  Alexandre  Venelli.  Side-channel  analysis  on  blinded  regular  scalar  multiplications.  In 
Willi  Meier  and  Debdeep  Mukhopadhyay,  editors.  Proceedings  of  the  15th  International  Conference  on  Cryptology  in  India 
(INDOCRYPT  2014),  Lecture  Notes  in  Computer  Science,  New  Delhi,  India,  December  2014.  Springer. 

12.  Nicolas  Gama,  Phong  Q.  Nguyen,  and  Oded  Regev.  Lattice  enumeration  using  extreme  pruning.  In  EUROCRYPT,  volume 
61 10  of  Lecture  Notes  in  Computer  Science,  pages  257-278,  2010. 

13.  Daniel  M.  Gordon.  A  survey  of  fast  exponentiation  methods.  Journal  of  Algorithms,  27(1):  129-146,  April  1998. 

14.  Nick  Howgrave-Graham  and  Nigel  P.  Smart.  Lattice  attacks  on  digital  signature  schemes.  Designs,  Codes  and  Cryptography, 
23(3):283-290,  2001. 

15.  Avi  Kivity,  Yaniv  Kamay,  Dor  Laor,  Uri  Lublin,  and  Anthony  Liguori.  kvm:  the  Linux  virtual  machine  monitor.  In  Proceedings 
of  the  Linux  Symposium,  volume  one,  pages  225-230,  Ottawa,  Ontario,  Canada,  June  2007. 

16.  Arjen  K.  Lenstra,  Hendrik  W.  Lenstra,  and  Laszlo  Lovasz.  Factoring  polynomials  with  rational  coefficients.  Mathematische 
Annalen,  261(4):515  -534,  1982. 

17.  Lixin  Li,  James  E.  Just,  and  R.  Sekar.  Address-space  randomization  for  Windows  systems.  In  Proceedings  of  the  22nd  Annual 
Computer  Security  Applications  Conference,  pages  329-338,  Miami  Beach,  Florida,  United  States,  December  2006. 

18.  Bodo  Moller.  Parallelizable  elliptic  curve  point  multiplication  method  with  resistance  against  side-channel  attacks.  In 
Agnes  Hui  Chan  and  Virgil  D.  Gligor,  editors.  Proceedings  of  the  fifth  International  Conference  on  Information  Security, 
number  2433  in  Lecture  Notes  in  Computer  Science,  pages  402^13,  Sao  Paulo,  Brazil,  September  2002. 

19.  Bodo  Moller.  Improved  techniques  for  fast  exponentiation.  In  P.  J.  Lee  and  C.  H.  Lim,  editors.  Information  Security  and 
Cryptology  -  ICISC  2002,  number  2587  in  Lecture  Notes  in  Computer  Science,  pages  298-312.  Springer- Verlag,  2003. 

20.  National  Institute  of  Standards  and  Technology.  EIPS  PUB  186-4  Digital  Signature  Standard  (DSS),  2013. 

21.  Phong  Q.  Nguyen  and  Igor  E.  Shparlinski.  The  insecurity  of  the  digital  signature  algorithm  with  partially  known  nonces. 
Journal  of  Cryptology,  15(3):151-176,  June  2002. 

22.  Phong  Q.  Nguyen  and  Igor  E.  Shparlinski.  The  insecurity  of  the  elliptic  curve  digital  signature  algorithm  with  partially  known 
nonces.  Designs,  Codes  and  Cryptography,  30(2):201-217,  September  2003. 

23.  Claus-Peter  Schnorr  and  M.  Euchner.  Lattice  basis  reduction:  Improved  practical  algorithms  and  solving  subset  sum  problems. 
In  Fundamentals  of  Computation  Theory  -  FCT  1991,  volume  529  of  Lecture  Notes  in  Computer  Science,  pages  68-85. 
Springer,  1991. 

24.  Jerome  A.  Solinas.  Generalized  Mersenne  numbers.  Technical  Report  CORR-39,  University  of  Waterloo,  1999. 

25.  Carl  A.  Waldspurger.  Memory  resource  management  in  VMware  ESX  Server.  In  David  E.  Culler  and  Peter  Druschel,  editors. 
Proceedings  of  the  Fifth  Symposium  on  Operating  Systems  Design  and  Implementation,  pages  1 8 1-194,  Boston,  Massachusetts, 
United  States,  December  2002. 

26.  Yuval  Yarom  and  Naomi  Benger.  Recovering  OpenSSL  ECDSA  nonces  using  the  Flush-i-Reload  cache  side-channel  attack. 
Cryptology  ePrint  Archive,  Report  2014/140,  February  2014.  http :  // epr int .  iacr .  org/. 

27.  Yuval  Yarom  and  Katrina  Falkner.  Flush-i-Reload:  a  high  resolution,  low  noise,  L3  cache  side-channel  attack.  In  Proceedings 
of  the  23rd  USENIX  Security  Symposium,  pages  719-732,  San  Diego,  California,  United  States,  August  2014. 


476 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


A  Experimental  results 
A.1  256  Bit  Keys 
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Table  1 ;  Results  for  d  triples  taken  from  s  signatures  with  a  256-bit  key  (Z  =  3) 
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Table  2:  CVP  results  for  75  triples  taken  from  s  signatures  with  a  256-bit  key  (Z  =  3) 


A.2  521  Bit  Keys 
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Table  3:  SVP  results  for  d  triples  taken  from  s  signatures  with  a  521-bit  key  (Z  =  4) 
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Table  4:  SVP  results  for  d  triples  taken  from  s  signatures  with  a  521 -bit  key  (Z  =  4) 
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Fig.  4:  Success  probability  per  number  of  signatures  Fig.  5:  Expected  running  time  per  number  of  signa- 

against  a  256  bit  key  tures  against  a  256  bit  key 


Fig.  6:  Success  probability  per  number  of  signatures  Fig.  7;  Expected  running  time  per  number  of  signa- 

against  a  521  bit  key  tures  against  a  521  bit  key 
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